Once we define minimum competence, we need to design methods for assessing that competence. Psychometrics, the science of measuring mental processes and abilities, plays an important role in designing those methods. On this page we outline the four components of an appropriate assessment method: validity, reliability, fairness, and feasibilty. In our Solutions section, we explain how different assessment methods meet each of these criteria.
Psychometricians and other assessment specialists rely upon the Standards for Educational and Psychological Testing when designing assessments. Those Standards provide guidance for making assessments valid, reliable, and fair. Given the increased concern about fairness in testing, a group of scholars recently published a companion guide, Fairness in Educational and Psychological Testing, that elaborates on the need for fairness in assessment. We have incorporated insights from these guides throughout our website, but you can consult the original sources if you want to learn more.
Validity
Validity means that an assessment measures the qualities that the user wants to measure. Assessments are not inherently valid or invalid. Instead, their validity depends on the purpose for which they are used. A bathroom scale, for example, offers a valid measure of a person's weight but it does not offer a valid measure of their ability to play the guitar. When licensing lawyers, we want assessment methods that will measure the skills and knowledge needed for entry-level law practice.
Reliability
Reliability means that an assessment measure offers consistent results. If you repeatedly put the same stone on a bathroom scale, a reliable scale will provide the same weight each time--even if the weighings are separated by weeks or months. Similarly, a reliable assessment of minimum competence will provide the same outcome for each candidate (pass or fail) regardless of when the candidate takes the assessment or who grades it. A candidate's knowledge and skills, of course, usually change over time. But if a candidate has the same level of knowledge and skills in April and May, a reliable assessment will produce the same outcome each time.
Fairness
Fairness doesn't matter when we measure inanimate objects like rocks, but it is essential when assessing people. In licensing, a fair assessment gives all candidates an equal opportunity to demonstrate their knowledge and skills. The assessment should not provide results that are biased by race, gender, household income, disability, or other characteristics that are irrelevant to the knowledge and skills being measured. An assessment method that produces a disparate impact based on one or more of these characteristics is not necessarily unfair--but it warrants serious scrutiny. Systemic biases often underlie methods with a disparate impact.
Feasibility
The Standards do not include feasibility as a requirement for sound assessments, but other scholars have noted the importance of this requirement. An assessment method must be possible, given real-world limits on resources. When assessing the feasibility of new methods, however, it is important to acknowledge the costs of the status quo. A written bar exam is expensive to create, administer, and grade. The currrent exam also imposes stiff costs on candidates, who pay fees to take the exam, travel to exam sites, purchase expensive prep courses, and forego income while preparing for the exam. The feasibility of other assessment methods should be judged in comparison to those costs.
Striking a Balance
The requirements of validity, reliability, fairness, and feasibility sometimes point in different directions. The most valid way to assess a candidate's minimum competence might be for an examiner to observe that candidate practicing law over several months. That method, however, is unlikely to be feasible. It probably would also lack reliability and fairness: examiners might differ in their views of minimum competence, and they might harbor unconscious biases against some candidates.
NCBE's Uniform Bar Exam, in contrast, is highly reliable. NCBE and jurisdictions use statistical methods to constrain variations in both the exam's difficulty over time and in the scores that examiners assign to essay and performance test answers. Critics, however, point to shortcomings in the exam's validity and fairness.
Choosing an assessment method requires making trade-offs among validity, reliability, fairness, and feasibility. Which criteria are most important in your jurisdiction? What is the optimal balance among these requirements? One option, which a few states are now exploring, is to offer candidates a choice of assessment methods. Each method may have different strengths and weaknesses, and candidates can choose the method that best aligns with their own preferences.