jump to main area
:::
A- A A+

Seminars

Differential Item Functioning Assessment in Com-puterized Adaptive Testing

  • 2011-08-08 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. Ya-Hui Su
  • Department of Psychology, National Chung Cheng University

Abstract

This talk will start with an introduction to computerized adaptive testing (CAT) and differential item functioning (DIF) assessment. CAT is a form of computer-based test that adapts to the examinee’s ability level. CAT successively selects questions from a large collection of potential items (item pool) so as to maximize the precision of the exam based on what is known about the examinee from previous questions. CAT requires fewer test items to arrive at equally accurate scores. DIF assessment refers to an analysis that identifies items for which examinees with the same overall ability level have different probabilities of answering correctly because they are from different sub-populations. In order to eliminate bias against sub-populations, DIF detection has become an essential routine item analysis procedure in test development to maintain test fairness and validity. Most DIF assessment techniques are designed for paper-and-pencil (P&P) tests, and are simply applied to items when developing an item pool for CAT. Operational CAT requires constantly adding new items or replacing old ones to maintain the item pool. To ensure the test quality, the new items are routinely pretested and screened for DIF. Therefore in an operational CAT all of the items used in the tests have already passed DIF screening. However, it will be important to consider how the CAT mode affects DIF screening for future item development in CAT. There are several reasons why DIF detection may be more important in CAT than in P&P tests. CAT administers fewer items than a P&P test, so each item has a bigger impact on the examinees’ estimated ability (Zwick, 2000). Item selection is based on the result of the previous item, so an item exhibiting DIF might consequentially effect an incorrect selection for the subsequent item. It is also important to consider several factors that might affect DIF detection in CAT. For example, not every examinee receives the same items or sequence of items when different CAT item selection procedures are used. This affects estimates of the latent trait, and items with DIF may affect some examinees not others. Since the true achievement level is not known, examinees must be matched based on this estimated level, and the accuracy of latent trait estimation will be an important factor in DIF detection. Therefore, it is important to generate pure trait estimates when performing DIF detection. In this study, I compared current approaches to defining a matching variable with a new proposed matching variable. The new proposed matching variable yielded higher power than the previous one on DIF detection in CATs. Some future research avenues will be discussed as well.

Update:
scroll to top