Document worth reading: “Explainable Deterministic MDPs”
We present a way for a positive class of Markov Decision Processes (MDPs) which will relate the optimum protection once more to various reward sources throughout the environment. For a given preliminary state, with out completely computing the worth function, q-value function, or the optimum protection the algorithm can determine which rewards will and will not be collected, whether or not or not a given reward could be collected solely as quickly as or continually, and which native most contained in the worth function the preliminary state will in the long run lead to. We show that the technique could be utilized to map the state home to ascertain areas that are dominated by one reward provide and may completely analyze the state home to make clear all actions. We current a mathematical framework to point how all of that’s attainable with out first computing the optimum protection or worth function. Explainable Deterministic MDPs