Statistical Test Specifications for Performance Assessments: Is This an Oxymoron

This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound performance assessments can be described based on current experience in rater reliability, test reliability, generalizability, and validity. Content specifications are not the focus of this paper, but it is apparent that there is significant variation in the functioning of assessment tasks, and that content must be matched to objectives of the assessment. Considering performance assessment tasks as the target of instruction provides an appealing and straightforward model for assessment, but generalizing to other tasks is an issue that cannot be ignored. Two options are available to the test developer wishing to produce a performance assessment with generalizable results. The first is to select performance assessment tasks that are at least moderately intercorrelated, and the second is to increase the number of tasks administered until the desired level of generalizability is attained. Domain coverage and high stakes test use are other issues that must be explored. It is argued that statistical specifications such as inter-rater reliability, inter-task correlations, and generalizability coefficients are an important part of the design of performance assessments. (Contains 20 references.) (SLD)

Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability, Performance Based Assessment, Statistical Analysis, Test Construction, Test Reliability, Test Use, Test Validity

Author: Reckase, Mark D.



