Reinforcement Learning

Reinforcement Learning

Batch TD(0) as Dynamic Programming on the Empirical MRP

A step-by-step derivation showing that the fixed point of batch TD(0) coincides with the Bellman solution of the empirical Markov reward process defined by transition-count MLEs and empirical conditional mean rewards.

Read post →