Computerized adaptive testing (CAT) represents one of the most successful applications of item response theory. Unlike conventional fixed-form tests where every examinee answers the same items, a CAT tailors the test to each individual by selecting items that are most informative given the examinee's current estimated ability. The result is a test that is both shorter and more precise: examinees are not bored by items that are too easy or frustrated by items that are too hard.
The CAT Algorithm
2. Select item: choose i* = argmax_i I_i(θ̂_current) from available pool
3. Administer item i* and record response
4. Update ability estimate: θ̂_new via MLE or EAP
5. Check stopping rule: if SE(θ̂) ≤ threshold or max items reached, stop
6. Otherwise, return to step 2
The core of a CAT is the item selection criterion. The most common approach is maximum Fisher information: select the item whose information function is largest at the current ability estimate. This is optimal for minimizing the asymptotic variance of the MLE. Alternative selection criteria include Bayesian approaches that minimize the expected posterior variance, and a-stratification methods that mitigate the overexposure of highly discriminating items early in the test.
Ability Estimation in CAT
After each response, the ability estimate is updated. Maximum likelihood estimation (MLE) is straightforward but undefined until the examinee has both correct and incorrect responses. Expected a posteriori (EAP) estimation, which computes the mean of the posterior distribution of θ given the responses, avoids this problem by incorporating a prior distribution. Weighted likelihood estimation (WLE) provides a bias-corrected alternative that does not require a prior.
A pure maximum-information algorithm would select the same high-discrimination items for every examinee near the population mean, leading to overexposure of these items and potential security risks. Exposure control methods, such as the Sympson-Hetter procedure, impose maximum exposure rates by probabilistically blocking overexposed items. Content balancing constraints ensure that the adaptive test covers the content specifications — the same blueprint of topics and cognitive levels — as the fixed-form test it replaces. These practical constraints reduce efficiency slightly but are essential for operational viability.
Stopping Rules and Precision
CATs use either fixed-length or variable-length stopping rules. In variable-length CATs, the test ends when the standard error of the ability estimate falls below a specified threshold, meaning each examinee receives exactly as many items as needed for a predetermined level of precision. This is especially efficient for examinees at the extremes of the ability distribution, who may require very few items. Fixed-length CATs administer a set number of items but still benefit from adaptive selection, achieving greater precision than a random or fixed set of the same length.
Operational CAT programs include the Graduate Record Examination (GRE), many nursing licensure exams (NCLEX), and numerous certification and professional licensing tests. The efficiency gains are substantial: the NCLEX, for instance, achieves classification accuracy comparable to a 200-item fixed test with an average of about 75 adaptively selected items. The mathematical foundations of CAT — item information theory, sequential estimation, and optimal design — exemplify how IRT transformed psychometric practice from static to dynamic measurement.