Deep Reinforcement Learning (Deep-RL) has received considerable attention in recent years due to its ability to make an agent learn how to take optimal control actions, given rich observation data via the maximization of a reward function. Future space missions will need new on-board autonomy capabilities with increasingly complex requirements at the limits of the vehicle performance. This justifies the use of machine learning based techniques, in particular reinforcement learning in order to allow exploring the edge of the performance trade-off space. The guidance and control systems development for Reusable Launch Vehicles (RLV) can take advantage of reinforcement learning techniques for optimal adaption in the face of multi-objective requirements and uncertain scenarios.
In AI4GNC - a project funded by the European Space Agency (ESA), led by DEIMOS and participated by INESC-ID, the University of Lund, and TASC - a Deep-RL algorithm was used to train an actor-critic agent to simultaneously control the engine thrust magnitude and the two TVC gimbal angles to land a RLV in 6-DoF simulation. The design followed an incremental approach, progressively augmenting the number of degrees of freedom and introducing more complexity factors such as nonlinearity in models. Ultimately, the full 6-DoF problem was addressed using a high fidelity simulator that includes a nonlinear actuator model and a realistic vehicle aerodynamic model. Starting from an initial vehicle state along a reentry trajectory, the problem consists of precisely land the RLV while ensuring system requirements satisfaction, such as saturation and rate limits in the actuation, and aiming at fuel consumption optimality. The Deep Deterministic Policy Gradient (DDPG) algorithm was adopted as candidate strategy to allow the design of an integrated guidance and control algorithm in continuous action and observation spaces.
The results obtained are very satisfactory in terms of landing accuracy and fuel consumption. These results were also compared to a more classical and industrially used solution, due to its capability to yield satisfactory landing accuracy and fuel consumption, composed of a successive convexification guidance and a PID controller tuned independently for the non-disturbed nominal scenario. A reachability analysis was also performed to assess the stability and robustness of the closed-loop system composed by the integrated guidance and control NN, trained for the 1-DoF scenario, and the RLV dynamics.
Taking into account the fidelity of the benchmark adopted and the results obtained, this approach is deemed to have a significant potential for further developments and ultimately space industry applications, such as In-Orbit Servicing (IOS) and Active Debris Removal (ADR), that also require a high level of autonomy.