The application of information theory to neuroscience, pioneered by Fred Rieke, David Warland, Rob de Ruyter van Steveninck, and William Bialek in their 1997 monograph Spikes, provides a principled framework for asking how much information neural responses carry about the external world. Unlike model-dependent approaches that assume a specific decoding scheme, mutual information between stimulus and response is a model-free measure of the total information available in the neural code, regardless of how downstream neurons might read it out.
Neural Mutual Information
H(R) = total response entropy (bits)
H(R|S) = noise entropy (variability given the same stimulus)
For spike count r in time window T:
I(S;R) = Σ_s Σ_r P(s) · P(r|s) · log₂[P(r|s) / P(r)]
The total response entropy H(R) reflects the full range of neural responses, including both stimulus-driven and noise variability. The noise entropy H(R|S) measures the response variability when the stimulus is held fixed. The difference — the mutual information — captures only the stimulus-related component. This decomposition is fundamental: it separates the informative signal from the neural noise floor without assuming any specific coding model.
Rate Codes and Temporal Codes
A central question in neuroscience is whether neurons transmit information primarily through their firing rates (rate code) or through the precise timing of individual spikes (temporal code). Information-theoretic analysis provides an empirical answer: by comparing the mutual information computed from spike counts (rate) with that computed from the full spike train (including temporal structure), researchers can determine how much additional information is carried by spike timing. In many sensory systems — the fly visual system, the auditory nerve, the somatosensory cortex — temporal coding carries significantly more information than rate coding alone.
Estimating mutual information from limited data is notoriously difficult. Naive plug-in estimates are biased upward because finite sampling systematically underestimates entropy for high-dimensional distributions. The Panzeri-Treves (1996) correction, the Nemenman-Shafee-Bialek (2004) Bayesian estimator, and the direct method of Strong et al. (1998) address this bias. Choosing the appropriate estimator for the data size and dimensionality remains a critical methodological consideration in neural information analysis.
Population Coding
In neural populations, information theory addresses how information is distributed across neurons. If neurons are independent, the total information is the sum of individual contributions. But neural correlations — both noise correlations and signal correlations — alter the population information in complex ways. Noise correlations that are aligned with the signal direction reduce population information, while those orthogonal to it are benign. Averbeck, Latham, and Pouget (2006) provided a systematic framework for understanding how correlations affect information in neural populations.
Information-theoretic analysis has also been applied to brain-machine interfaces, where it provides an upper bound on the performance of any decoder. The mutual information between neural activity and movement parameters (direction, speed, force) quantifies the maximum achievable decoding accuracy, guiding the design of neuroprosthetic systems. These applications demonstrate the practical value of Shannon's abstract framework when applied to the concrete problem of neural communication.