Standard Practice for Professional Certification Performance Testing

SIGNIFICANCE AND USE
3.1 This practice for performance testing provides guidance to performance test sponsors, developers, and delivery providers for the planning, design, development, administration, and reporting of high-quality performance tests. This practice assists stakeholders from both the user and consumer communities in determining the quality of performance tests. This practice includes requirements, processes, and intended outcomes for the entities that are issuing the performance test, developing, delivering and evaluating the test, users and test takers interpreting the test, and the specific quality characteristics of performance tests. This practice provides the foundation for both the recognition and accreditation of a specific entity to issue and use effectively a quality performance test.  
3.2 Accreditation agencies are presently evaluating performance tests with criteria that were developed primarily or exclusively for multiple-choice examinations. The criteria by which performance tests shall be evaluated and accredited are ones appropriate to performance testing. As accreditation becomes more critical for acceptance by federal and state governments, insurance companies, and international trade, it becomes more critical that appropriate standards of quality and application be developed for performance testing.
SCOPE
1.1 This practice covers both the professional certification performance test itself and specific aspects of the process that produced it.  
1.2 This practice does not include management systems. In this practice, the test itself and its administration, psychometric properties, and scoring are addressed.  
1.3 This practice primarily addresses individual professional performance certification examinations, although it may be used to evaluate exams used in training, educational, and aptitude contexts. This practice is not intended to address on-site evaluation of workers by supervisors for competence to perform tasks.  
1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.  
1.5 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

General Information

Status
Published
Publication Date
31-Jan-2024

Relations

Effective Date
01-Feb-2024

Overview

ASTM E2849-18(2024) - Standard Practice for Professional Certification Performance Testing - establishes a comprehensive framework for the planning, design, development, administration, and reporting of high-quality professional certification performance tests. Developed by ASTM International, this standard is designed to guide performance test sponsors, developers, and delivery providers in creating and maintaining tests that reliably and validly assess candidate competencies in authentic, target contexts. The practice is also a crucial reference for organizations seeking recognition and accreditation for their testing processes.

The scope of ASTM E2849-18(2024) covers the performance test itself and essential aspects of its development and administration, including psychometric considerations, test scoring, candidate preparation, and procedural fairness. The standard focuses on individual professional certification examinations, but its principles may also be applied to training, educational, and aptitude evaluations. By adhering to internationally recognized principles from the World Trade Organization's Technical Barriers to Trade (TBT) Committee, this standard ensures global relevance and acceptance.

Key Topics

  • Planning and Design: Guidance on developing a performance test blueprint that reflects current industry roles and tasks. Emphasizes role delineation and job task analysis to ensure content validity.
  • Test Development and Administration:
    • Item creation, including stimulus and scenario development.
    • Procedures for setting cutpoints and quality assurance.
    • Consideration for accessibility, accommodations, and candidate fairness.
  • Psychometrics and Scoring:
    • Requirements for construct validity, reliability, and psychometric properties.
    • Use of performance rubrics, inter-rater reliability, and automated scoring verification.
    • Techniques for assessing item timing and handling differential system responsiveness.
  • Candidate Preparation:
    • Specifications for practice tests (interface preparation and self-assessment).
    • Transparent communication of scoring rubrics to inform candidate effort and focus.
  • Test Security and Authentication:
    • Methods for candidate identification and test process documentation.
    • Security interventions for maintaining test integrity, including measures against misuse.
  • Reporting and Accreditation:
    • Adequate reporting to support examinee remediation.
    • Emphasis on accreditation using criteria suitable for performance testing, not just multiple-choice exams.

Applications

ASTM E2849-18(2024) is highly relevant to any organization involved in the certification, licensing, or credentialing of professionals who demonstrate their competencies through performance tests. Typical applications include:

  • Professional Certification Bodies: Implementing reliable, valid testing for credentials in fields such as healthcare, construction, IT, and skilled trades.
  • Accreditation Organizations: Evaluating the quality of performance exams for recognition by government, insurance, or international trade authorities.
  • Educational Institutions and Training Providers: Developing authentic assessments for course completion, skill verification, or job-readiness programs.
  • Test Developers and Administrators: Building secure, accessible, and fair assessments that reflect real-world job requirements and support diverse candidate needs.
  • International Standards Compliance: Ensuring alignment with WTO principles and facilitating acceptance in global regulatory contexts.

Related Standards

Professionals developing or accrediting certification performance tests may also reference the following standards for additional guidance:

  • ISO/IEC 17024 - Conformity assessment - General requirements for bodies operating certification of persons.
  • ASTM E2659 – Standard Practice for Certificate Programs.
  • Association of Test Publishers Standards – Best practices for assessment development and psychometrics.
  • Relevant National and International Guidelines on test security, fairness, and accommodations.

By following ASTM E2849-18(2024), organizations can build performance testing programs that uphold the highest standards of validity, reliability, and fairness, thereby fostering trust in professional certification and credentialing processes.

Keywords: ASTM E2849, professional certification, performance testing, accreditation, psychometrics, test development, test scoring, skills assessment, credentialing, international standards.

Buy Documents

Standard

ASTM E2849-18(2024) - Standard Practice for Professional Certification Performance Testing

English language (5 pages)
sale 15% off
sale 15% off

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Bureau Veritas

Bureau Veritas is a world leader in laboratory testing, inspection and certification services.

COFRAC France Verified

DNV

DNV is an independent assurance and risk management provider.

NA Norway Verified

Sponsored listings

Frequently Asked Questions

ASTM E2849-18(2024) is a standard published by ASTM International. Its full title is "Standard Practice for Professional Certification Performance Testing". This standard covers: SIGNIFICANCE AND USE 3.1 This practice for performance testing provides guidance to performance test sponsors, developers, and delivery providers for the planning, design, development, administration, and reporting of high-quality performance tests. This practice assists stakeholders from both the user and consumer communities in determining the quality of performance tests. This practice includes requirements, processes, and intended outcomes for the entities that are issuing the performance test, developing, delivering and evaluating the test, users and test takers interpreting the test, and the specific quality characteristics of performance tests. This practice provides the foundation for both the recognition and accreditation of a specific entity to issue and use effectively a quality performance test. 3.2 Accreditation agencies are presently evaluating performance tests with criteria that were developed primarily or exclusively for multiple-choice examinations. The criteria by which performance tests shall be evaluated and accredited are ones appropriate to performance testing. As accreditation becomes more critical for acceptance by federal and state governments, insurance companies, and international trade, it becomes more critical that appropriate standards of quality and application be developed for performance testing. SCOPE 1.1 This practice covers both the professional certification performance test itself and specific aspects of the process that produced it. 1.2 This practice does not include management systems. In this practice, the test itself and its administration, psychometric properties, and scoring are addressed. 1.3 This practice primarily addresses individual professional performance certification examinations, although it may be used to evaluate exams used in training, educational, and aptitude contexts. This practice is not intended to address on-site evaluation of workers by supervisors for competence to perform tasks. 1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.5 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

SIGNIFICANCE AND USE 3.1 This practice for performance testing provides guidance to performance test sponsors, developers, and delivery providers for the planning, design, development, administration, and reporting of high-quality performance tests. This practice assists stakeholders from both the user and consumer communities in determining the quality of performance tests. This practice includes requirements, processes, and intended outcomes for the entities that are issuing the performance test, developing, delivering and evaluating the test, users and test takers interpreting the test, and the specific quality characteristics of performance tests. This practice provides the foundation for both the recognition and accreditation of a specific entity to issue and use effectively a quality performance test. 3.2 Accreditation agencies are presently evaluating performance tests with criteria that were developed primarily or exclusively for multiple-choice examinations. The criteria by which performance tests shall be evaluated and accredited are ones appropriate to performance testing. As accreditation becomes more critical for acceptance by federal and state governments, insurance companies, and international trade, it becomes more critical that appropriate standards of quality and application be developed for performance testing. SCOPE 1.1 This practice covers both the professional certification performance test itself and specific aspects of the process that produced it. 1.2 This practice does not include management systems. In this practice, the test itself and its administration, psychometric properties, and scoring are addressed. 1.3 This practice primarily addresses individual professional performance certification examinations, although it may be used to evaluate exams used in training, educational, and aptitude contexts. This practice is not intended to address on-site evaluation of workers by supervisors for competence to perform tasks. 1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.5 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

ASTM E2849-18(2024) is classified under the following ICS (International Classification for Standards) categories: 03.120.20 - Product and company certification. Conformity assessment. The ICS classification helps identify the subject area and facilitates finding related standards.

ASTM E2849-18(2024) has the following relationships with other standards: It is inter standard links to ASTM E2849-18. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

ASTM E2849-18(2024) is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E2849 − 18 (Reapproved 2024) An American National Standard
Standard Practice for
Professional Certification Performance Testing
This standard is issued under the fixed designation E2849; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2.1.3 construct validity, n—degree to which the test evalu-
ates an underlying theoretical idea resulting from the orderly
1.1 This practice covers both the professional certification
arrangement of facts.
performance test itself and specific aspects of the process that
2.1.4 differential system responsiveness, n—measurable dif-
produced it.
ference in response latency between two systems.
1.2 This practice does not include management systems. In
2.1.5 examinee, n—candidate in the process of taking a test.
this practice, the test itself and its administration, psychometric
properties, and scoring are addressed.
2.1.6 gating item, n—unit of evaluation that shall be passed
to pass a test.
1.3 This practice primarily addresses individual profes-
sional performance certification examinations, although it may
2.1.7 inter-rater reliability, n—measurement of rater consis-
be used to evaluate exams used in training, educational, and tency with other raters.
aptitude contexts. This practice is not intended to address
2.1.7.1 Discussion—See rater reliability.
on-site evaluation of workers by supervisors for competence to
2.1.8 item, n—scored response unit.
perform tasks.
2.1.8.1 Discussion—See task.
1.4 This standard does not purport to address all of the
2.1.9 item observer, n—human or computer element that
safety concerns, if any, associated with its use. It is the
observes and records a candidate’s performance on a specific
responsibility of the user of this standard to establish appro-
item.
priate safety, health, and environmental practices and deter-
2.1.10 on the job, n—another term for “target context.”
mine the applicability of regulatory limitations prior to use.
2.1.10.1 Discussion—See target context.
1.5 This international standard was developed in accor-
dance with internationally recognized principles on standard- 2.1.11 performance test, n—examination in which the re-
sponse modality mimics or reflects the response modality
ization established in the Decision on Principles for the
required in the target context.
Development of International Standards, Guides and Recom-
mendations issued by the World Trade Organization Technical
2.1.12 power test, n—examination in which virtually all
Barriers to Trade (TBT) Committee.
candidates have time to complete all items.
2.1.13 practitioners, n—people who practice the contents of
2. Terminology
the test in the target context.
2.1 Definitions—Some of the terms defined in this section
2.1.14 rater reliability, n—measurement of rater consistency
are unique to the performance testing context. Consequently,
with a uniform standard.
terms defined in other standards may vary slightly from those
2.1.14.1 Discussion—See inter-rater reliability.
defined in the following.
2.1.15 reconfiguration, n—modification of the user interface
2.1.1 automatic item generation (AIG), n—a process of
for a process, device, or software application.
computationally generating multiple forms of an item.
2.1.15.1 Discussion—Reconfiguration ranges from adjust-
2.1.2 candidate, n—someone who is eligible to be evaluated
ing the seat in a crane to importing a set of macros into a
through the use of the performance test; a person who is or will
programming environment.
be taking the test.
2.1.16 reliability, n—degree to which the test will make the
same prediction with the same examinee on another occasion
with no training occurring during the intervening interval.
This practice is under the jurisdiction of ASTM Committee E36 on Accredi-
2.1.17 rubric, n—set of rules by which performance will be
tation & Certification and is the direct responsibility of Subcommittee E36.30 on
Personnel Credentialing.
judged.
Current edition approved Feb. 1, 2024. Published March 2024. Originally
2.1.18 speeded test, n—examination that is time-constrained
approved in 2013. Last previous edition approved in 2018 as E2849 – 18. DOI:
10.1520/E2849-18R24. so that more than 10 % of candidates do not finish all items.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2849 − 18 (2024)
2.1.19 target context, n—situation within which a test is 4.2.2 The examinee shall not be provided so much informa-
designed to predict performance. tion about the scoring rubric that it diminishes the ability of
stakeholders to generalize the examinee’s skills from his or her
2.1.20 task, n—unit of performance requested for the can-
test score.
didate to do; a task can be scored as one item; a task may also
4.3 Practice Tests:
be comprised of multiple components each of which is scored
as an item. 4.3.1 There are two types of practice tests: one for gaining
familiarity with the user interface of the test items and the other
2.1.21 test, n—sampling of behavior over a limited time in
to allow the candidate to self-evaluate mastery of the content.
which an authenticated examinee is given specific tasks under
4.3.1.1 User Interface Preparation—A practice test or tests
specified conditions, tasks that are scored by a uniformly
to familiarize candidates with the user interface shall be made
applied rubric.
available to the candidate at no charge. The practice test shall
2.1.21.1 Discussion—A test can also be referred to as an
be sufficient to assure adequate candidate practice time so that
assessment, although typically “assessment” is used for forma-
the degree of familiarity with the user interface does not impair
tive evaluation. This practice addresses specifically certifica-
the validity of the test.
tion and licensure, as stated in 1.3. A test is designed to predict
4.3.1.2 Content Self-Assessment—Practice tests that evalu-
the examinee’s behavior in a specified context, the “target
ate content mastery may be made available at no charge or for
context.”
a fee. There is no obligation on the part of the test provider to
2.1.22 trajectory, n—candidate’s path through the solution
provide a self-assessment practice test to evaluate content
to a single item, task, or test.
mastery.
2.1.22.1 Discussion—Also termed the response trajectory.
NOTE 1—If a practice test is provided, it shall sample test content
sufficiently to allow the candidate to predict reasonably success or failure
2.1.23 validity, n—extent to which a test predicts target
on the test.
behavior for multiple candidates within a target context.
4.3.2 Candidates shall know specifically which type of
practice test they are requesting.
3. Significance and Use
4.3.3 Both types of practice test shall help candidates
3.1 This practice for performance testing provides guidance
understand how their responses are going to be scored.
to performance test sponsors, developers, and delivery provid-
ers for the planning, design, development, administration, and
5. Procedure
reporting of high-quality performance tests. This practice
5.1 Item Development—All requirements in Section 5 may
assists stakeholders from both the user and consumer commu-
be superseded by empirical, logical, or statistical arguments
nities in determining the quality of performance tests. This
demonstrating that the practices of a certification body are
practice includes requirements, processes, and intended out-
equivalent to or superior to the practices required to meet this
comes for the entities that are issuing the performance test,
practice.
developing, delivering and evaluating the test, users and test
5.1.1 Item Time Limits:
takers interpreting the test, and the specific quality character-
5.1.1.1 When items or test sections can be accessed
istics of performance tests. This practice provides the founda-
repeatedly, no item time limit is required to be enforced or
tion for both the recognition and accreditation of a specific
recommended to the candidate.
entity to issue and use effectively a quality performance test.
5.1.1.2 When items can be accessed only once, item time
3.2 Accreditation agencies are presently evaluating perfor- limits shall be either suggested or enforced, with a visual
mance tests with criteria that were developed primarily or
timekeeping option for the examinee.
exclusively for multiple-choice examinations. The criteria by 5.1.1.3 For a power test, item time limits shall be set using
which performance tests shall be evaluated and accredited are a standard practice such as the mean item response time
ones appropriate to performance testing. As accreditation measured in beta testing plus two standard deviations for
becomes more critical for acceptance by federal and state successful candidates within the calibration sample. When
governments, insurance companies, and international trade, it sufficient data have been collected from test administrations,
becomes more critical that appropriate standards of quality and the item time shall be recalibrated to reflect performance on the
application be developed for performance testing. actual test
5.1.1.4 For a speeded test, item time limits shall be deter-
4. Candidate Preparation mined by measuring minimum acceptable time limits in the
target context.
4.1 Number of Practice Items—A candidate shall be given
5.1.2 Differential System Responsiveness—Differential sys-
access to sufficient practice items that the novelty of the item
tem responsiveness may be due to variance in network
format shall not inhibit the examinee’s ability to demonstrate
bandwidth, network latency, random-access memory (RAM),
his or her capabilities.
storage speed, operating systems, computer processing unit
4.2 Scoring Rubric Available to Candidates:
(CPU) count and performance, bus speed, or other factors.
4.2.1 Candidates shall have sufficient information about the
NOTE 2—It is the obligation of the test developer to attempt to measure
scoring rubric to be able to appropriately prioritize their efforts
differences in latency and system responsiveness whenever possible and,
in completing the item or test. if possible, to compensate appropriately for these variations.
E2849 − 18 (2024)
5.1.2.1 There shall be compensation in test scoring for import their industry standard configurations into the test
variances in the hardware and software environment to assure environment, provided that doing so does not compromise
that all examinees are scored fairly. exam security, provide unfair advantage over other candidates,
or impact the generalizability of results.
NOTE 3—Compensation may be in adjusting item time limits, item
5.1.9.3 The criterion the test developer shall use to deter-
latency scoring factors, or other compensatory variables.
mine “minimal reconfiguration” is whether competence mea-
5.1.2.2 An examinee taking a test under one set of condi-
sured with the default configuration will predict performance
tions shall receive the same score as if he or she took the test
with a reconfigured system.
under any admissible alternative set of conditions.
5.1.10 Level of Feedback—Feedback during the test shall
5.1.3 References/Citations—When possible, codes,
reflect feedback available doing similar tasks in the target
guidelines, industry standards, application source code, or
context.
other evidence shall be sufficient to establish the correctness of
scoring a procedure. Where such documentation does not exist,
NOTE 5—Feedback may be time compressed to minimize testing time.
correct responses may be documented as standard practice by
Interim results may be omitted if they do not impact success in performing
a vote of the subject matter expert (SME) advisory panel for the item.
the test.
5.1.11 American with Disabilities Act (ADA)
5.1.4 Rater Reliability—When human raters are involved in
Accommodations—Accommodations shall be fair to the
assessing item success, rater reliability shall correlate with an
candidate, the testing administrator, other candidates, and the
established performance standard greater than 0.80.
potential employer alike, with no interest predominating.
5.1.4.1 When multiple raters are used to rate a single
Before awarding accommodations, the test administrator shall
performance, inter-rater reliability shall correlate higher than
discuss with the candidate what the candidate feels would be
0.80.
reasonable accommodations and, when feasible, shall allow the
5.1.5 Automated Scoring—To verify automated scoring, the
methods candidates use for accomplishing tasks in the target
test developer shall develop test cases that verify the scoring of
context. The candidate shall possess the capability to perform
a minimum of 95 % of anticipated responses. When items are
the required test item in full with the agreed upon accommo-
scored automatically, for the first 100 administrations of
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...