Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy.

A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study

D'Angelo G.;Tipaldi M.;Glielmo L.
2019-01-01

Abstract

Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy.
2019
Approximate dynamic programming
Apriori classifier
Association classifier
Association rules
Markov decision process
Spacecraft autonomy
Stochastic optimal control
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/46189
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 18
social impact