Mathematical Psychology
About

Generalizability Theory

Generalizability theory extends classical reliability analysis by using analysis of variance to partition observed-score variance into multiple sources (facets) such as persons, items, occasions, and raters.

σ²(Xpir) = σ²_p + σ²_i + σ²_r + σ²_pi + σ²_pr + σ²_ir + σ²_pir,e

Classical test theory lumps all sources of measurement error into a single undifferentiated error term. Generalizability theory (G-theory), developed by Cronbach, Gleser, Nanda, and Rajaratnam (1972), overcomes this limitation by using analysis of variance to decompose observed-score variance into components associated with each source of variation — called facets — and their interactions. This allows researchers to identify which sources of error are most consequential and to design measurement procedures that minimize the most important sources.

Facets and Variance Components

Two-Facet Crossed Design (p × i × r) X_pir = μ + ν_p + ν_i + ν_r + ν_pi + ν_pr + ν_ir + ν_pir,e

σ²(X) = σ²_p + σ²_i + σ²_r + σ²_pi + σ²_pr + σ²_ir + σ²_pir,e

In a study where persons (p) respond to items (i) scored by raters (r), the total variance is partitioned into seven components: person variance (the "signal" — genuine individual differences), item variance, rater variance, and all two-way and three-way interactions. The three-way interaction is confounded with residual error. Each component is estimated using expected mean squares from the ANOVA.

G-Studies and D-Studies

G-theory analysis proceeds in two phases. The generalizability study (G-study) estimates variance components from observed data. The decision study (D-study) uses these estimates to project the generalizability coefficient for different measurement designs — varying the number of items, raters, or occasions. This separation of estimation from design optimization is a key advantage over classical methods.

Generalizability and Dependability Coefficients Eρ² = σ²_p / (σ²_p + σ²_δ)

where σ²_δ = σ²_pi/n_i + σ²_pr/n_r + σ²_pir,e/(n_i × n_r)

For absolute decisions:
Φ = σ²_p / (σ²_p + σ²_Δ)

The generalizability coefficient Eρ² is the G-theory analogue of the reliability coefficient for relative decisions (ranking individuals). For absolute decisions (comparing scores to a fixed standard), the dependability coefficient Φ uses a broader error term σ²_Δ that includes main effects of facets. This distinction is critical in criterion-referenced testing, where scores are compared to cut-points rather than to other examinees.

Crossed vs. Nested Designs

When every person encounters every item and every rater, the design is fully crossed, and all interaction components are estimable. In practice, many designs are partially nested — different raters score different persons, or different test forms contain different items. Nested facets change the variance decomposition: a rater nested within persons (r:p) contributes a component that is inseparable from the rater-by-person interaction. G-theory accommodates both designs, though the interpretation of variance components differs.

Applications and Impact

G-theory has been especially influential in performance assessment, where multiple sources of error (raters, tasks, occasions) simultaneously affect scores. In medical education, for instance, G-studies revealed that task-sampling variability (the person × station interaction in clinical examinations) typically exceeds rater variability, redirecting efforts from rater training to increasing the number of stations. In writing assessment, G-theory showed that the number of essay prompts matters more than the number of raters per prompt.

Multivariate generalizability theory extends the framework to profiles of scores, allowing simultaneous analysis of multiple dependent variables. More recent developments include the integration of G-theory with multilevel modeling, which handles unbalanced designs more naturally, and Bayesian estimation of variance components, which provides credible intervals rather than point estimates. Despite these advances, the fundamental insight of G-theory — that measurement error is multifaceted and its sources must be disentangled — remains its most important contribution.

Related Topics

References

  1. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. Wiley.
  2. Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Sage. doi:10.4135/9781412985437
  3. Brennan, R. L. (2001). Generalizability theory. Springer. doi:10.1007/978-1-4757-3456-0
  4. Briesch, A. M., Swaminathan, H., Welsh, M., & Chafouleas, S. M. (2014). Generalizability theory: A practical guide to study design, implementation, and interpretation. Journal of School Psychology, 52(1), 13–35. doi:10.1016/j.jsp.2013.11.008

External Links