Measurement invariance (MI) — also called measurement equivalence or factorial invariance — is the requirement that the relationship between observed indicators and latent constructs is the same across groups, occasions, or conditions. Without measurement invariance, observed differences between groups on a psychological measure might reflect genuine differences in the underlying construct or, alternatively, differences in how the instrument functions. Establishing MI is therefore a logical prerequisite for any meaningful comparison of scores across populations.
Formal Definition
The distribution of observed scores x, conditional on the latent variable η,
is independent of group membership v.
In the linear factor model, this implies:
Λ_g = Λ (equal loadings across groups g)
τ_g = τ (equal intercepts)
Θ_g = Θ (equal residual covariances)
Meredith (1993) provided the formal statistical definition: a set of measures satisfies strict factorial invariance when the conditional distribution of observed scores given the latent variables is independent of group membership. In the context of the linear factor model, this translates to three levels of increasingly restrictive constraints on the measurement parameters: equal factor loadings (metric invariance), equal intercepts (scalar invariance), and equal residual variances (strict invariance).
Consequences of Non-Invariance
Violations of invariance at different levels have different consequences. If metric invariance fails, the construct has different meaning in different groups — a one-unit change in the latent variable does not produce the same change in the indicator. Comparing structural relationships (e.g., correlations or regression slopes) across groups is compromised. If scalar invariance fails, observed mean differences do not map directly onto latent mean differences — group comparisons on total scores or subscale means are potentially biased. This is particularly consequential in cross-cultural research, where mean-level comparisons are often the primary interest.
When comparing many groups (e.g., 30 countries), testing exact invariance for all items across all groups is often impractical and overly restrictive. Asparouhov and Muthén (2014) developed the alignment method, which estimates approximate invariance by minimizing a simplicity function that pushes non-invariance toward zero while allowing small deviations. This produces latent mean and variance estimates that are comparable across groups without requiring exact invariance for every item. Bayesian approximate MI methods similarly use informative priors centered on zero for cross-group differences in loadings and intercepts.
Testing Methods and Extensions
Multi-group CFA is the standard method for testing MI (see Multi-Group SEM). In the IRT framework, MI is related to the absence of differential item functioning (DIF): an item shows DIF when its item parameters differ across groups. The two frameworks are mathematically connected — DIF in the 2PL model corresponds to non-invariant loadings and/or intercepts in CFA. This connection has led to hybrid methods that combine CFA and IRT approaches for invariance testing.
Longitudinal measurement invariance extends the concept to repeated measures: the measurement model must be invariant across time points for growth parameters to be meaningfully interpreted. If the factor loading for an item changes over time, the item is measuring the construct differently at different occasions, and changes in observed scores may reflect changes in item functioning rather than genuine growth or decline. Testing longitudinal MI follows the same configural-metric-scalar hierarchy but with equality constraints imposed across time points rather than across groups.
Measurement invariance is not merely a technical requirement — it is a substantive question about the cross-group and cross-temporal comparability of psychological constructs. Its violation provides important information: it tells us that the way a construct manifests in observable behavior differs across contexts, which is itself a finding of psychological interest. The study of MI thus sits at the intersection of psychometrics and substantive psychology, providing the methodological foundation for the enterprise of comparing minds across the divisions of human experience.