On Exact Embedding Framework for Optimal Control of Markov Decision Processes

IRIS

This article deals with the embedding framework of Markov decision processes (MDPs) with discrete state and action space to find optimal actions. The optimal control problem of MDPs can be efficiently tackled by restructuring the same into an equivalent linearly-solvable Markov decision processes (LMDPs) through the method called embedding. However, state costs under the embedding may not exactly match the original costs and even assume unrealistic values. In this work, we derive a constructive sufficient condition to devise an exact embedding solution rendering the embedded state cost to match the original system. Furthermore, since, in this case, the embedding implies a transition from the discrete to continuous action space, the correlation between the obtained continuous action and an equivalent desired discrete action is investigated using a maximum a posteriori probability-based method. Finally, some examples, including mammalian cell-cycle network, are presented to demonstrate the effectiveness of the proposed method.

On Exact Embedding Framework for Optimal Control of Markov Decision Processes

Kharade S.;Sutavani S.;Yerudkar A.;Wagh S.;Liu Y.;Vecchio C. D.;Singh N. M.

2024-01-01

Abstract

This article deals with the embedding framework of Markov decision processes (MDPs) with discrete state and action space to find optimal actions. The optimal control problem of MDPs can be efficiently tackled by restructuring the same into an equivalent linearly-solvable Markov decision processes (LMDPs) through the method called embedding. However, state costs under the embedding may not exactly match the original costs and even assume unrealistic values. In this work, we derive a constructive sufficient condition to devise an exact embedding solution rendering the embedded state cost to match the original system. Furthermore, since, in this case, the embedding implies a transition from the discrete to continuous action space, the correlation between the obtained continuous action and an equivalent desired discrete action is investigated using a maximum a posteriori probability-based method. Finally, some examples, including mammalian cell-cycle network, are presented to demonstrate the effectiveness of the proposed method.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				Embedding
Kullback-Leibler (KL) divergence
linearly-solvable Markov decision processes (LMDPs)
Markov decision processes (MDPs)
optimal control
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/67446

Citazioni

ND

4

ND

social impact