The two-parameter logistic (2PL) model is a cornerstone of item response theory (IRT). Proposed by Birnbaum (1968), it models the probability that an examinee with latent ability θ answers an item correctly as a function of two item parameters: difficulty (b) and discrimination (a). The model provides a richer characterization of item behavior than the one-parameter Rasch model by allowing items to differ in how sharply they discriminate among examinees of different ability levels.
The Model
where θ = latent ability
a_i = discrimination parameter (slope)
b_i = difficulty parameter (location)
The item characteristic curve (ICC) is an S-shaped logistic function. The difficulty parameter b locates the curve on the ability continuum: it is the value of θ at which the probability of a correct response is exactly 0.50. The discrimination parameter a controls the steepness of the curve at the inflection point. Higher values of a indicate that the item differentiates sharply between examinees just above and just below the difficulty level; lower values indicate a more gradual transition.
Parameter Estimation
Parameters are estimated by marginal maximum likelihood (MML), in which the ability distribution is integrated out and item parameters are estimated from the marginal likelihood of the response patterns. Ability estimates for individual examinees are then obtained conditional on the estimated item parameters, using maximum likelihood estimation (MLE) or expected a posteriori (EAP) methods. The EM algorithm is the standard computational approach for MML estimation, iterating between computing expected counts (E-step) and maximizing the log-likelihood (M-step).
where u_ji = response of person j to item i
Q_i(θ) = 1 − P_i(θ), g(θ) = prior ability distribution
The Rasch (1PL) model constrains all discrimination parameters to be equal, yielding the property of specific objectivity: item parameters can be estimated independently of the ability distribution, and ability comparisons are independent of which items are administered. The 2PL relaxes this constraint, gaining flexibility at the cost of losing specific objectivity. The choice between models reflects a tension between measurement principles and empirical fit that has been debated for decades in the psychometric community.
Applications
The 2PL model is widely used in educational and psychological testing. It forms the basis for item banking — calibrating large pools of items on a common scale — and is the default model in many computerized adaptive testing systems. The discrimination parameter enables item selection algorithms to choose the most informative item for each examinee's current ability estimate. In test development, items with very low discrimination provide little measurement information and are typically revised or discarded.
Model fit is evaluated at both the item and test levels. At the item level, chi-square statistics compare observed and expected response proportions across ability groups. At the test level, likelihood-ratio tests compare the 2PL to more restricted (1PL) or more general (3PL) models. Graphical inspection of empirical item characteristic curves against the fitted model remains an essential diagnostic tool. The 2PL model strikes a balance between parsimony and flexibility that has made it the workhorse of modern psychometrics.