Item response theory classical test theory Function Characteristic Curve

Ahmad JavedAugust 12, 2023

0 120

Item response theory (IRT), also known as latent response theory, refers to a family of mathematical models that attempt to explain the relationship between latent traits (unobservable characteristic or attribute) and their manifestations (i.e., observed results, responses, or performance). They establish a link between the properties of the items on an instrument, the individuals who respond to these items, and the underlying trait being measured. IRT assumes that the latent construct (for example, stress, knowledge, or attitudes) and the items of a measure are organized on an unobservable continuum. Therefore, its main objective is focused on establishing the position of the individual in that continuum.

classical test theory

The Classical Test Theory focuses on the same objective and before the conceptualization of IRT. It was (and still is) used to predict an individual’s latent trait based on an observed total score on an instrument. In the CTT, the true score predicts the level of the latent variable and the observed score. The error is normally distributed with a mean of 0 and a mean of 1.

You are reading the Item response theory.

Item response theory versus classical test theory

IRT Assumptions

1) Monotonicity – The assumption is that as the level of the trait increases, the probability of a correct answer also increases

2) Unidimensionality – The model assumes that there is a dominant latent trait being measured and that this trait is the driving force of the responses observed for each measure item

3) Local independence – The answers given to the different items of a test are mutually independent given a given level of ability.

4) Invariance – We are allowed to estimate the item parameters from any position on the item response curve. Consequently, we can estimate the parameters of an item from any group of subjects who have responded to the item.

If the assumptions hold, the differences in the observation of the correct answers between the respondents will be due to the variation of their latent trait.

Item Response Function and Item Characteristic Curve (ICC)

IRT models predict respondents’ responses to an instrument’s items based on their position on the latent trait continuum and on item characteristics, also known as parameters. The item response function characterizes this association. The underlying assumption is that each response to an item on an instrument provides some bias on the individual’s level of latent trait or ability.

The ability of the person (θ), in simple terms, is the probability of giving the correct answer to that item, so the greater the ability of the individual, the greater the probability that he will answer correctly. This relationship can be represented graphically and is known as the item characteristic curve. Furthermore, the probability of passing a correct answer increases monotonically as the respondent’s ability increases. Keep in mind that, theoretically, the capacity (θ) ranges between -∞ and +∞, however, in applications, it usually ranges between -3 and + 3.

Item Parameters

As people’s abilities change, their position on the latent construct continuum changes and is determined by the sample of respondents and the item parameters. An item must be sensitive enough to rate respondents on the suggested unobservable continuum.

The difficulty of the item (bi)

It is the parameter that determines how the item behaves along the skill scale. It is determined at the point of medium probability, that is, the capacity in which 50% of the respondents approve of the correct answer. On an item characteristic curve, difficult-to-pass items move to the right of the scale, indicating the higher ability of respondents who pass correctly, while easier items move further to the left of the ability scale.

Item discrimination (ai)

Determines the rate at which the probability of passing a correct item changes based on skill levels. This parameter is essential to differentiate between individuals who have similar levels of the latent construct of interest. The ultimate goal of designing a precise measure is to include items with high discrimination, in order to map individuals along the latent trait continuum.

On the other hand, researchers should be careful if an item is found to have negative discrimination, since the probability of passing the correct answer should not decrease as the ability of the respondent increases. Hence, these items should be reviewed. The discrimination scale of the items, theoretically, ranges between -∞ and +∞; and normally does not exceed 2; therefore, being realistic, it ranges between (0.2)

divination (ci)

Item Guess is the third parameter that takes into account the guess of an item. Constrains the probability of passing the correct answer as the skill approaches -∞.

Population invariance

In simple terms, the item parameters behave similarly in different populations. This is not the case when following the CTT in the measurement. Since the unit of analysis is the item in the IRT, the item placement (difficulty) can be standardized (subject to a linear transformation) across populations and thus items can be easily compared. An important note to add is that, even after linear transformation, the parameter estimates derived from two samples will not be identical, the invariance, as the name suggests, refers to the population invariance and thus , applies only to the parameters of the population of items.

Types of IRT models

Below are the types of Item response theory models

One-dimensional models

One-dimensional models predict the ability of items that measure a dominant latent trait.

Dichotomous IRT models

Dichotomous IRT Models are used when responses to items in a measure are dichotomous (i.e. 0.1)

The 1-parameter logistic model

This model is the simplest of the IRT models. It is composed of a parameter that describes the latent trait (capacity – θ) of the person who responds to the items, as well as another parameter for the item (difficulty). The following equation represents its mathematical form:

The model represents the item response function for the 1-parameter logistic model that predicts the probability of a correct response given the ability of the respondent and the difficulty of the item. In the 1-PL model, the discrimination parameter is fixed for all items and, consequently, all item characteristic curves corresponding to the different items of the measure are parallel along the ability scale.

Rasch model versus 1-parameter logistic models

The models are mathematically the same, however, the Rasch Model limits the Item Discrimination (ai) to 1, while the 1-parameter logistic model strives to fit the data as much as possible and does not limit the discrimination factor. a 1. In the Rasch Model, the model is superior, as it is more concerned with developing the variable that is being used to measure the dimension of interest. Therefore, when constructing the adjustment of an instrument, the Rasch Model would be the best, improving the precision of the items.

The two-parameter logistic model

The two-parameter logistic model predicts the probability of a correct answer using two parameters (difficulty bi and discrimination ai).

The discrimination parameter is allowed to vary between items. Therefore, the CPI of the different items may cross and have different slopes. The greater the slope, the greater the discrimination of the item, since it will be able to detect subtle differences in the ability of the respondents.

The item information function

As in the case of the 1-PL model, the information is calculated as the product between the probability of a correct answer and an incorrect answer. However, the product is multiplied by the square of the discrimination parameter. The implication is that the higher the discrimination parameter, the more information provided by the item. Since the discrimination factor is allowed to vary between items, the graphs of the item information function may also look different.

Capacity estimation

With the 2-PL model, the assumption of local independence remains valid and the maximum likelihood estimate of ability is used. Although the probabilities of the response patterns are still summed, they are now weighted by the item discrimination factor for each response. Therefore, their likelihood functions may differ from each other and peak at different levels of θ.

The 3-parameter logistic model

This model predicts the probability of a correct answer, in the same way as Model 1 – PL and Model 2 – PL, but is constrained by a third parameter called the guess parameter (also known as the pseudo-random parameter), which constrains the probability of passing a correct answer when the ability of the respondent approaches -∞. As respondents respond to an item by guessing, the amount of information provided by that item decreases and the information items feature reaches its maximum level compared to other features. In addition, the difficulty is no longer bounded by the average probability. Items that are answered by guessing indicate that the respondent’s ability is less than their difficulty.

Model fit

One way to choose which model to fit is to evaluate the relative fit of the model through its information criteria. The AIC estimates are compared and the model with the lowest AIC is chosen. Alternatively, we can use the Chi-square (Deviation) and measure the change in the ratio. Since it follows a chi-square distribution, we can estimate whether the two models are statistically different from each other.

Other IRT models

They include models that handle polytomous data, such as the graded response model and the partial credit model. These models predict the expected score for each response category. On the other hand, other IRT models, such as nominal response models, predict the expected scores of individuals responding to items with unordered response categories (eg, Yes, No, Maybe). In this brief overview, we have focused on one-dimensional IRT models, related to the measurement of a latent trait, however these models would not be appropriate in the measurement of more than one construct or latent trait. In the latter case, the use of multidimensional IRT models is recommended.

We hope that you have understood the concept of Item response theory