Mathematical Psychology
About

Reward Prediction Error

The reward prediction error — the difference between received and expected reward — is the core learning signal in reinforcement learning and is encoded by midbrain dopamine neurons.

δ = r + γV(s') − V(s)

The reward prediction error (RPE) is the discrepancy between the reward actually received and the reward that was expected. This signal, central to both the Rescorla-Wagner model and temporal difference learning, was discovered to be encoded by midbrain dopamine neurons in one of the most celebrated findings in computational neuroscience.

Neural Evidence

Dopamine and Prediction Error Unexpected reward: δ > 0 → dopamine burst (phasic increase)
Expected reward: δ = 0 → no change (baseline firing)
Omitted expected reward: δ < 0 → dopamine pause (phasic decrease)

With learning, dopamine response transfers from reward to reward-predicting cue

Schultz, Dayan, and Montague (1997) showed that the firing patterns of dopamine neurons in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) match the temporal difference prediction error signal with remarkable precision. This correspondence between a computational quantity (RPE) and a neural signal (phasic dopamine) remains one of the strongest bridges between computational models and neurobiology.

Implications for Psychology

The RPE framework has been applied to understanding addiction (drugs of abuse hijack the RPE signal), depression (reduced dopaminergic RPE signals), and decision-making deficits in Parkinson's disease and schizophrenia. Individual differences in RPE signaling have been linked to trait impulsivity, reward sensitivity, and vulnerability to substance use disorders.

Interactive Calculator

Each row records a state transition: state (integer), reward (numeric), next_state (integer). The calculator applies temporal-difference learning: δ = r + γV(s') − V(s). Parameters: α=0.1 (learning rate), γ=0.9 (discount factor).

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. https://doi.org/10.1126/science.275.5306.1593
  2. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). Appleton-Century-Crofts. https://doi.org/10.1037/a0030892
  3. Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16(5), 1936–1947. https://doi.org/10.1523/JNEUROSCI.16-05-01936.1996
  4. Steinberg, E. E., Keiflin, R., Boivin, J. R., Witten, I. B., Deisseroth, K., & Janak, P. H. (2013). A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience, 16(7), 966–973. https://doi.org/10.1038/nn.3413

External Links