Specifications Grading and Equity
A More Complicated Story Than We Hoped

Today’s guest post comes to you from a research team based at the University of Virginia. Brandon J. Yik is an Assistant Professor in the Department of Chemistry at the University of Georgia and was a postdoctoral scholar at the University of Virginia when this study took place. Lisa Morkowchuk is an Associate Professor, General Faculty in the Department of Chemistry at the University of Virginia, where she leads the general chemistry laboratory program. Marilyne Stains is a Professor in the Department of Chemistry at the University of Virginia, where her research focuses on the implementation and evaluation of evidence-based teaching practices in STEM courses.
When we switched our large-enrollment general chemistry laboratory courses from traditional grading to specifications grading, we believed we were doing something good, both pedagogically and ethically. Like many instructors drawn to alternative grading, we were motivated in part by the promise that clearer expectations, mastery-based feedback, and flexible assessment structures would create a more equitable environment for all students, especially those who have been historically underserved by higher education.
Our data told a more complicated story.
We are a research team at the University of Virginia (UVA), a large public research university in the mid-Atlantic region of the United States, studying grading practices in chemistry. Over the past several years, we have been analyzing the final course grades of over 9,700 students enrolled in year-long general chemistry laboratory courses, comparing outcomes before and after our transition to specifications grading. We recently published our findings in JACS Au, and we wanted to share what we found with this community because we think the results matter for anyone implementing, or thinking about implementing, alternative grading with equity as a goal.
What We Did
For context: our general chemistry laboratory courses are one-credit courses taken by most first- and second-year students at UVA, typically alongside the corresponding lecture course. The grades students receive in these lab courses are separate from the grades they receive in the lecture course. The laboratory courses enroll roughly 1,600 to 1,700 students in the fall (general chemistry laboratory course I) and 800 to 900 in the spring (general chemistry laboratory course II).
We compared two years of traditional grading (2017–2019) with two years of specifications grading (2021–2023). Our specifications grading implementation closely followed Nilson’s framework: individual assignments were scored pass/fail based on clearly defined expectations aligned with learning objectives, and final course grades were determined through a bundling system where earning higher grades required demonstrating mastery of more objectives. The differences between the traditional and specifications-graded versions of these courses were more of a reorganization of how assignments were structured and evaluated than a wholesale redesign of course content. Scientific communication skills, for example, became their own explicitly assessed learning objective rather than being folded into post-lab assignments.
We looked at student outcomes by four social identities: gender, first-generation status, underrepresented minority (URM) status, and transfer student status. We also applied an intersectional lens using a “systemic advantage index” (SAI) that combines these identities into a single measure; rather than only asking, for example, “how did first-generation students do?” we could also ask questions relating to students’ systemic advantage, “how did students who are simultaneously first-generation, URM, female, and a transfer student do?”
What We Found
The good news: grades went up across the board
Under specifications grading, more students across every individual social identity group we examined earned higher final grades (specifically A-range) compared to the traditionally-graded courses. The increases were statistically significant. For example, URM students in the first-semester course saw A grades increase by over 26% after the switch to specifications grading. First-generation students saw A grades jump by 20% in that same course. Transfer students in the second-semester course went from 43% earning A grades to 76%. These are not trivial shifts and can lead to longer-term impacts. Earning a higher grade in a foundational science course increases the likelihood that a student continues in science, technology, engineering, and mathematics (STEM) majors. If specifications grading is helping more students pass with high grades, that is a meaningful contribution to retention, and thus equity.
The complicated news: opportunity gaps persisted
Despite those gains, the relative opportunity gaps between systemically advantaged (e.g., male, continuing-generation, non-URM, first-year admit) and disadvantaged students (e.g., female, first-generation, URM, transfer) did not meaningfully close. When we looked at the association between students’ social identities and their course grades, the effect sizes remain unchanged between traditional and specifications grading. In other words: specifications grading lifted many boats, but it lifted them roughly equally, leaving the distance between them about the same.
When we modeled the probability of course success (i.e., earning an A or B grade) the disparities were stark under both grading systems. For example, under traditional grading, the odds of a transfer student succeeding in the first-semester course were about one-eleventh those of a first-year admit student. Under specifications grading, those odds increased, but transfer students still had substantially lower odds of success. For URM and first-generation students, the improvement in odds of success was minimal.
The intersectional analysis told an even more sobering story. Students with the most systemic disadvantages (i.e., those who are simultaneously first-generation, URM, female, and transfer students) showed no significant improvement in average grades under specifications grading. The grade gaps that separated the most and least systemically advantaged students in traditional grading were still present, and of similar magnitude, in the specifications-graded courses.
An important nuance: lower grades also increased for some groups
One pattern that caught our attention: while high grades went up under specifications grading, the proportion of students earning low grades (C, D, F, or withdrawal) also increased slightly for some groups, and the increase was larger for URM and first-generation students than for their more advantaged peers. This is a finding we are still working to understand, and it underscores why implementation decisions matter. The pass/fail structure of specifications grading may interact with students’ resources, time, and prior preparation in ways that are not yet well understood.
What This Means for Instructors
We want to be clear about what our findings do and do not say.
They do not say specifications grading is bad, or that you should abandon it. The grade increases we observed are real, and the possibility that more students are being retained in STEM because of those higher grades is genuinely meaningful.
They do say that specifications grading, at least as we implemented it, is not a sufficient solution to educational inequities on its own. If you are implementing alternative grading primarily because you want to close opportunity gaps, our findings suggest you should not assume that the grading reform alone will accomplish that even when it is implemented carefully.
This is not entirely surprising. Grading systems exist within larger institutional structures, and those structures carry history. A student who is the first in their family to attend college, who may be working while enrolled, and who may have experienced gaps in their prior science preparation, faces challenges that a well-designed grading system cannot fully address. Specifications grading can remove some artificial barriers such as unclear expectations, penalty-heavy policies, and high-stakes exams, but it cannot remove all of them.
What might help? We do not have definitive answers, but the research points toward a few directions worth exploring:
Pair grading reform with support structures. Specifications grading creates an environment where revision and mastery are valued over performance on a single attempt. But students need time, access to meaningful feedback, and sometimes explicit guidance on how to take advantage of additional opportunities. In particular, first-generation students and transfer students may benefit from proactive outreach and structured support that helps them navigate the new system.
Study your own implementation. Our findings are specific to our course, our institution, and our particular implementation. Different design choices may lead to different outcomes. We strongly encourage instructors who implement specifications grading to treat it as a scholarly endeavor: collect data, disaggregate it by student identities, and look honestly at what you find. Centers for teaching and learning are often excellent partners for this kind of work.
Look beyond grades. Our study measured final course grades, which is an important but incomplete picture. Future research needs to examine students’ experiences with specifications grading, how it affects their sense of belonging, their confidence, their relationship to failure and revision, but also disaggregated by different social identities. We cannot design for equity if we only measure grades.
Be honest with students and colleagues. One of the risks of the alternative grading movement is that enthusiasm for a practice can outpace the evidence for it. We believe specifications grading is a genuine improvement over many traditional grading approaches. We also believe that honesty about its limitations is essential to the field’s integrity.
Where We Go from Here
We are at an interesting and important moment. Specifications grading has spread rapidly through chemistry and other STEM disciplines, driven by instructors who are genuinely motivated by better pedagogy and greater equity. The practice is here to stay, and that is probably a good thing, but the research base is still catching up to the enthusiasm, and studies like ours are still rare.
We need more of them. We need studies at diverse institution types, with diverse student populations, using diverse implementations. We need qualitative work that illuminates why gaps persist even when overall grades improve. And we need the field to resist the temptation to treat any grading reform as a silver bullet.
Specifications grading gave more of our students a better chance to demonstrate what they know. It did not give all of our students an equal chance. Those two things can and should coexist in how we talk about this work, motivate further inquiry, and guide future implementation decisions.






