Mathematical Psychology
Mathematical psychology is the scientific discipline that uses formal mathematical models to understand and predict psychological phenomena — from how we detect faint signals and make risky decisions, to how we learn, remember, and categorize the world.
Where experimental psychology asks “what happens?”, mathematical psychology asks “what is the precise quantitative law?” It transforms verbal theories into equations, enabling rigorous testing, precise prediction, and deep insight into the mechanisms of mind.
This reference covers the full landscape — from foundational measurement theory and psychophysics through signal detection, decision making, and learning models, to psychometrics, reaction time analysis, neural models, and information theory.
Mind = f(measurement, models, data, inference)Key Concepts
Core mathematical constructs and quantities used in mathematical psychology.
Sensitivity d'
A measure of an observer's ability to discriminate between signal and noise in signal detection theory. Computed as the standardized distance between the means of the signal and noise distributions.
d' = z(Hit Rate) - z(False Alarm Rate)A radiologist with d' = 2.5 can reliably distinguish tumors from benign tissue on X-rays, independent of their tendency to say 'yes'.
Utility Function
A mathematical mapping from objective outcomes to subjective value, capturing diminishing marginal utility and individual risk preferences in decision-making under uncertainty.
U(x) = x^alpha, 0 < alpha < 1 (concave for risk aversion)The subjective difference between $0 and $100 feels much larger than between $900 and $1000, reflecting the concavity of the utility function.
Weber Fraction
The ratio of the just-noticeable difference (JND) to the standard stimulus intensity, expressing the fundamental psychophysical law that discrimination thresholds scale proportionally with stimulus magnitude.
W = DeltaI / I = constantIf you can just notice a 1 oz difference when holding 10 oz (W = 0.1), you would need a 10 oz difference to detect a change in 100 oz.
Drift Rate
The average rate of evidence accumulation in sequential sampling models of decision making. Higher drift rates reflect stronger stimulus quality or greater processing efficiency.
dx = v dt + s dW (Wiener diffusion process)In a lexical decision task, high-frequency words produce larger drift rates than low-frequency words, yielding faster and more accurate responses.
Information Entropy
Shannon's measure of uncertainty or information content in a probability distribution. Quantifies the average surprise associated with outcomes from a random variable.
H(X) = -Sum p(x) log2 p(x)A fair coin has H = 1 bit (maximum uncertainty for two outcomes), while a biased coin with P(heads) = 0.99 has H near 0.08 bits.
Psychometric Function
A sigmoid-shaped function mapping stimulus intensity to the probability of a correct response or detection, characterized by threshold (midpoint) and slope (precision) parameters.
Psi(x) = gamma + (1 - gamma - lambda) * F(x; alpha, beta)Plotting percent-correct against contrast levels in a visual detection task produces an S-shaped curve whose midpoint estimates the detection threshold.
Stimulus Sampling
The probability of sampling a stimulus element on any given trial in Estes' stimulus sampling theory. Forms the basis for a probabilistic learning framework where conditioning depends on which elements are active.
P(response | trial n) = theta * P(conditioned) + (1 - theta) * P(prior)In a conditioning experiment, learning rate depends on the proportion of stimulus elements sampled and associated with the reinforced response on each trial.
Likelihood Ratio
The ratio of the probability of observed data under two competing hypotheses. Central to optimal decision-making in signal detection theory and Bayesian inference.
LR = P(data | H1) / P(data | H0)An ideal observer in a detection task compares the likelihood ratio to a criterion: respond 'signal' when LR exceeds beta, 'noise' otherwise.
Choice Probability
The probability of selecting a particular alternative from a choice set, as formalized by Luce's choice axiom. The probability is proportional to the strength of the alternative relative to all options.
P(i) = v(i) / Sum_j v(j)If brand A has strength 3 and brand B has strength 1, Luce's model predicts P(A) = 3/(3+1) = 0.75 in a binary choice.
Scale Type
The classification of measurement scales by their permissible transformations, as defined by Stevens. Determines which mathematical operations and statistical tests are meaningful for a given type of data.
Nominal < Ordinal < Interval < RatioTemperature in Celsius is an interval scale (arbitrary zero), while temperature in Kelvin is a ratio scale (true zero), determining whether ratios of measurements are meaningful.
Item Difficulty
A parameter in item response theory representing the point on the latent trait continuum where the probability of a correct response equals 0.5. Items with higher b values require greater ability to answer correctly.
P(correct | theta) = 1 / (1 + exp(-a(theta - b)))On a math test, an algebra item with b = -0.5 is easier than a calculus item with b = 2.0, meaning less ability is needed for a 50% chance of a correct answer.
Mutual Information
A measure of the statistical dependence between two random variables, quantifying how much knowing one variable reduces uncertainty about the other. Generalizes correlation to nonlinear relationships.
I(X;Y) = Sum_x Sum_y p(x,y) log(p(x,y) / (p(x)p(y)))Mutual information between neural firing rate and stimulus orientation quantifies how much information neurons carry about the stimulus.
Response Criterion
The decision threshold in signal detection theory that determines how much evidence is required before responding 'signal present.' Reflects the observer's bias, influenced by payoffs and prior probabilities.
beta = f(signal | x_c) / f(noise | x_c)A conservative airport screener sets a low criterion (liberal bias), flagging many bags to avoid missing threats, even at the cost of more false alarms.
Forgetting Rate
The rate parameter governing memory decay over time, typically following a power function. Describes how retrieval probability or memory strength diminishes as the retention interval increases.
m(t) = a * t^(-b_f) (power law of forgetting)Ebbinghaus found that memory for nonsense syllables decays rapidly at first, then levels off — a pattern well described by a power function with b near 0.5.