Recurrent Neural Networks

Recurrent neural networks (RNNs) extend feedforward architectures by adding connections from a layer back to itself or to earlier layers, creating cycles in the network graph. These feedback connections allow the network to maintain a dynamic internal state — a form of memory — that evolves over time as new inputs arrive. Jeffrey Elman (1990) introduced the "simple recurrent network" (SRN) that copies hidden-layer activations back to the input at the next time step, providing a powerful yet tractable architecture for modeling sequential processes in language, motor control, and temporal cognition.

Elman and Jordan Networks

Simple Recurrent Network (Elman, 1990) Context units: c(t) = h(t−1) (copy of previous hidden state)
Hidden state: h(t) = f(W_xh · x(t) + W_ch · c(t) + b_h)
Output: y(t) = g(W_hy · h(t) + b_y)

Jordan network variant: c(t) = y(t−1) (copies output instead)

In Elman networks, the context units provide a compressed summary of the network's processing history. At each time step, the network receives both the current input and its own previous hidden state, allowing it to integrate information across time. Jordan (1986) proposed a variant where the output, rather than the hidden state, is fed back as context. Both architectures are trained with backpropagation through time (BPTT), which unrolls the recurrent network into an equivalent feedforward network across time steps and applies standard backpropagation.

Temporal Processing and Cognitive Applications

Elman demonstrated that SRNs could learn grammatical structure from sequential exposure to sentences, discovering syntactic categories (noun, verb) and long-distance dependencies without explicit instruction. The hidden-unit representations that emerged reflected hierarchical grammatical structure, providing a connectionist account of how linguistic knowledge could be acquired from statistical regularities in the input. This work was foundational for debates about whether language acquisition requires innate grammatical knowledge or can emerge from domain-general learning mechanisms.

The Vanishing Gradient Problem

A fundamental challenge for RNNs is the vanishing gradient problem: when backpropagating through many time steps, gradients shrink exponentially, making it difficult to learn long-range temporal dependencies. Hochreiter and Schmidhuber (1997) addressed this with Long Short-Term Memory (LSTM) networks, which use gating mechanisms to control the flow of information and maintain gradients over long sequences. While LSTMs are primarily used in machine learning, the gating principle has influenced cognitive models of working memory and executive control.

In mathematical psychology, RNNs have been applied to serial recall, sequence learning, speech perception, and models of temporal context in memory. The ability of recurrent networks to represent time implicitly — through the evolving dynamics of their hidden state rather than through explicit temporal labels — makes them natural models for cognitive processes that unfold over time. They also connect to dynamical systems theory: the hidden state trajectory of an RNN can be analyzed as a dynamical system, with attractors, limit cycles, and transient dynamics corresponding to different cognitive states.

Elman and Jordan Networks

Temporal Processing and Cognitive Applications

References

External Links

Elman and Jordan Networks

Temporal Processing and Cognitive Applications

Related Topics

References

External Links