Evaluating how well a structural equation model fits the observed data is one of the most critical and debated steps in SEM analysis. Because the chi-square test of exact fit is almost always rejected with large samples — even when misfit is trivial — researchers rely on a battery of approximate fit indices that quantify different aspects of model-data discrepancy. Understanding the computation, interpretation, and limitations of these indices is essential for responsible SEM practice.
The Chi-Square Test
F_ML = ln|Σ(θ)| + tr(SΣ(θ)⁻¹) − ln|S| − p
df = p(p + 1)/2 − q
where p = number of observed variables, q = number of free parameters
The chi-square statistic tests the null hypothesis that the model-implied covariance matrix equals the population covariance matrix (exact fit). Under correct model specification and multivariate normality, χ² follows a chi-square distribution with degrees of freedom equal to the difference between the number of unique elements in the covariance matrix and the number of estimated parameters. The test is asymptotically exact but is sensitive to sample size: with large N, even trivial misspecifications produce significant chi-square values.
Approximate Fit Indices
CFI = 1 − max(χ²_model − df_model, 0) / max(χ²_null − df_null, 0)
TLI = ((χ²_null/df_null) − (χ²_model/df_model)) / ((χ²_null/df_null) − 1)
SRMR = √(mean of squared standardized residual covariances)
The RMSEA (Root Mean Square Error of Approximation) estimates the discrepancy per degree of freedom, rewarding parsimony. It comes with a confidence interval, providing information about the precision of the fit estimate. The CFI (Comparative Fit Index) and TLI (Tucker-Lewis Index) compare the target model to an independence (null) model in which all variables are uncorrelated. The SRMR (Standardized Root Mean Residual) is the average discrepancy between observed and predicted correlations.
Hu and Bentler (1999) proposed widely adopted cutoffs: RMSEA ≤ 0.06, CFI ≥ 0.95, TLI ≥ 0.95, SRMR ≤ 0.08. These cutoffs were derived from simulation studies with specific conditions (continuous data, ML estimation, correctly specified models). They should not be applied mechanically: the appropriate threshold depends on model complexity, sample size, the number of indicators, and the purpose of the analysis. Some methodologists have argued that rigid cutoffs have done more harm than good by encouraging a "fit index game" in which researchers modify models to achieve acceptable fit statistics rather than to improve substantive understanding.
Information Criteria and Model Comparison
When comparing non-nested models, information criteria provide principled alternatives to chi-square difference tests. The AIC (Akaike Information Criterion) = χ² + 2q balances fit and parsimony, with lower values preferred. The BIC (Bayesian Information Criterion) = χ² + q × ln(N) penalizes complexity more heavily, favoring simpler models. These criteria support comparison of models that are not nested — for instance, comparing a bifactor model with a correlated-factors model.
Best practice in SEM fit evaluation involves reporting multiple indices, examining the pattern across indices rather than relying on any single one, inspecting standardized residuals and modification indices to identify localized areas of misfit, and evaluating whether the model makes substantive sense. A model that fits well statistically but produces nonsensical parameter estimates (negative error variances, loadings greater than 1.0) is not a good model. Conversely, a model with marginal fit but theoretically meaningful parameters may warrant retention with appropriate caveats. Fit evaluation is ultimately a judgment that integrates statistical evidence with substantive knowledge.