Temporal Difference Learning

Temporal difference (TD) learning, developed by Richard Sutton in 1988, is a reinforcement learning algorithm that updates predictions about future rewards based on the discrepancy between consecutive predictions. TD learning bridges the Rescorla-Wagner model from animal learning theory and dynamic programming from optimal control theory.

TD Learning Update δₜ = rₜ + γ · V(sₜ₊₁) − V(sₜ)
V(sₜ) ← V(sₜ) + α · δₜ

δₜ = TD error (reward prediction error)
γ = discount factor (0 to 1)
α = learning rate

Connection to Dopamine

In a landmark discovery, Schultz, Dayan, and Montague (1997) showed that the firing patterns of midbrain dopamine neurons closely match the TD prediction error signal. Dopamine neurons fire when rewards are unexpected (positive δ), pause when expected rewards are omitted (negative δ), and show no response to fully predicted rewards (δ = 0). This correspondence has become one of the most successful examples of a computational model directly predicting neural activity.

Relationship to Rescorla-Wagner

The Rescorla-Wagner model can be seen as a special case of TD learning where there is only one time step between CS and US. TD learning generalizes this by allowing prediction errors to propagate backwards through multiple time steps, explaining phenomena like second-order conditioning and the timing of conditioned responses that the Rescorla-Wagner model cannot address.

Interactive Calculator

Each row records a state transition: state (integer), reward (numeric), next_state (integer). The calculator applies temporal-difference learning: δ = r + γV(s') − V(s). Parameters: α=0.1 (learning rate), γ=0.9 (discount factor).

Dataset (CSV)

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Connection to Dopamine

Relationship to Rescorla-Wagner

Interactive Calculator

References

External Links

Connection to Dopamine

Relationship to Rescorla-Wagner

Interactive Calculator

Related Topics

References

External Links