A yea/boo meter with eleven markings

Taking a look at "Grading Student Writing: Making It Simpler, Fairer, Clearer" by Peter Elbow

Sep 27, 2021

For the last few weeks, we’ve been discussing some big ideas. Today, I’ll focus in on one classic article:

Elbow, P. (1997). Grading student writing: Making it simpler, fairer, clearer. New directions for teaching and learning, 1997(69), 127-140. (Web link)

I’m a mathematician, and this book chapter only came to my attention recently. It immediately caught my attention with its clear, concise, and powerful arguments in favor of what Elbow calls “minimal grading”: grading student work using a scale with very few levels.

For example, Specifications grading uses a 2-level scale for each assignment: Satisfactory/Not yet. Many variations on standards-based grading use a 4-level rubric, such as EMRF. All types of all-or-nothing grading use minimal grading.

Elbow’s ideas apply far beyond writing, and his insightful descriptions gave me new ways of looking at some familiar ideas. So today, I’m going to highlight some of my favorite ideas from this paper.

Vertical vs. horizontal assessments and giving grades more meaning

One of Elbow’s central ideas in this paper is the concept of “vertical” vs. “horizontal” dimensions of grading.

The vertical dimension refers to number of levels: How many different categories student work can be divided into. Since this is a paper about minimal grading, Elbow argues for “short” assessment systems with few levels: pass/fail, completion-only, or even zero-level grading1.

The horizontal dimension represents the number of criteria being assessed, or what I would usually call standards or specifications. Elbow describes assessments with more clearly articulated standards as more horizontal.

A traditional grading system is very “tall”: all vertical, no horizontal. There are 100 levels in a percentage-based system, 11 in a traditional letter-grade system including +/- grades. All those levels appear to impart a lot of information and let instructors make many fine distinctions, but the truth is that all that verticality comes at the expense of meaning:

[C]onventional grades tell nothing at all about what it is that the student did well or badly; the greater precision of conventional grades is utterly untrustworthy.

A standards-based grading system using a 2-level rubric is “wide”: few vertical levels, many criteria in the horizontal dimension. The resulting list of marks, one per standard, serves as a concise description of exactly what the student has demonstrated on the assignment. One of Elbow’s primary arguments is that clear criteria give grades more meaning, because they directly describe what student work has (or hasn’t) achieved:

When we spell out our criteria in public—in an announcement or on a handout—we are making our grades carry more information or meaning than they usually do, even if we give nothing but a minimal grade.

Note that Elbow’s conception of horizontal grades doesn’t require assigning a mark for each standard. Specifications-style grading, in which the instructor assigns a single mark based on satisfactorily meeting a list of criteria, still is horizontal because it clearly spells out the criteria used for the assessment. The resulting mark has a clear meaning in terms of the criteria that a student’s work has met.

This is something that an all-vertical system doesn’t do. But one thing a vertical system can do is compare students against each other, generating competition but not meaning. These lead to one of my favorite quotes from the entire article:

Conventional grades distinguish eleven levels of pure quality—quality that is entirely undefined and unarticulated: conventional grades constitute nothing but a vertical stack of levels—each one defined in no other way than “better than the one below, worse than the one above”; it’s all numbers, no words; a yea/boo meter with eleven markings.

I giggle a little every time I think about an eleven-level yea/boo meter. Then I remember that it’s fundamental to most grading systems — even alternatives that avoid grades until the very end of the semester.

To tie it all up, here is Elbow’s summary of the essential distinction between vertical and horizontal approaches to assessment:

With the vertical emphasis, we are making a single difficult, sophisticated, evaluative decision along a single scale with multiple levels—but no words or definitions are involved. With a horizontal emphasis, we are making multiple decisions on multiple criteria—which are named—and the decisions are simpler, easier, and more believable.

In other words, Elbow is arguing that clearly defined standards with marks that indicate progress lead to easier and fairer grading.

Stakes vs. Levels and motivation

Once we have the idea of vertical vs. horizontal dimensions, next we need to decouple the number of levels used in grading student work from the stakes represented by the work.

It’s easy to assume, Elbow says, that if you’re grading with a low number of levels (such as pass/fail), the assignment must be low-stakes. Not the case:

But there is no law that passing has to be easy—especially for high stakes writing. A higher threshold or demand can be natural and appropriate. Note that even a two-level scale can be very demanding if we raise the bar (as at M.I.T.).

Here he’s referring to the fact that first-semester students at MIT are graded only using “Pass” and “No Record” in each class, and second-semester students are graded using A/B/C/No Record. MIT is, I believe, somewhat known for its high standards.

This issue of stakes vs. levels is central to Specifications grading, where each assignment is graded on a 2-level rubric (Satisfactory/Not yet). Specs grading works especially well with large assignments, such as semester-long projects, portfolios, and major writing assignments. Often these must be completed satisfactorily in order to pass a class or earn a high grade — practically the definition of “high stakes”.

However, Specs-graded assignments are marked simply Satisfactory/Not yet, with Satisfactory requiring quite a high bar. Better yet, this can be done in an entirely fair and transparent way by giving clear specifications, helpful feedback, and having opportunities for revision.

What about motivation? Doesn’t the opportunity to earn a higher mark — something that only exists when there are higher marks, such as an A/B/C/D/F or 100-point scale — motivate students? A common refrain in favor of grading everything is that grades provide motivation. If it’s not graded, it’s not done, and the opportunity to earn a higher grade is even more motivating. Elbow disagrees completely:

[W]hen students struggle for excellence only for the sake of a grade, what we see is not motivation but the atrophy of motivation: the gradual decline of the ability to work or think or wonder under one’s own steam.

This isn’t to say that it is easy to motivate students in minimally-graded classes. Students are so heavily steeped in grades-as-motivation that the absence of the extrinsic motivation of grades can lead to all sorts of difficulty.

Elbow ultimately seems to be in favor of a mix of low-stakes and high-stakes assignments. Low-stakes assignments can help develop trust and understanding between student and instructor and build a foundation of true motivation: “… they get small protected spaces for gradually developing small bits of intrinsic motivation.” Then students are better prepared for high-stakes assignments, where the stakes themselves provide some level of external motivation.

It is this last point, among everything in the paper, that I’m least sure about: To what extent are high-stakes assignments important? Do they actually help with learning, or is real learning better respected — at least in some cases — with not just minimal levels, but minimal stakes as well?

Some closing thoughts

I’ve long been a believer in what Elbow would call “horizontal” grading systems: Ones with few grading levels, but several clear criteria. Elbow’s language has helped me think about the trade-offs between vertical and horizontal systems. “Vertical” grading systems offer many perverse incentives to students and instructors, not least of which is the incentive to play games with numbers, to average and combine values that aren’t related in any way, and to care about arbitrary cut-offs. Grades grounded in meaningful criteria show how many of these games are (ahem) pointless.

There are many questions that I have after reading (and now re-reading) this paper. Here are a few:

How do we wean students off of grades-as-motivation? Elbow’s argument that extrinsic grade-based motivation is truly “the atrophy of motivation” struck me hard. Even in my “minimally graded” SBG classes, many students are still motivated by grades — just different grades than usual, and perhaps with some less toxic results. It’s progress, but within “the system”. We know that grades in any form can be toxic to intrinsic motivation, but they are baked so heavily into the world around us (not just academia, but everything) that even in the absence of grades, students become anxious because of that lack of grades. What can be done? Are systems like minimal grading a valuable intermediate step, or do they just replace many grades with fewer grades, but still grades? I’ll have more to say about that in a few weeks, when I reflect on my own ungrading experiment.
What would labor-based grading look like in… almost anything other than writing? Although this paper doesn’t focus on it, Elbow briefly mentions his own interest in labor-based contracts: Essentially 1-level grades in which everything is based on students simply completing work, without judgment on quality. The goal is for students to develop skills and habits of professional writers by doing them, without concern about external judgment. I’m a mathematician, and I’ve spent some time pondering what labor-based grading might look like in my field. Perhaps an undergraduate research project? Could it be done in an introductory class? What about other STEM fields, or, honestly, anything outside of writing?

When I first read them, Elbow’s descriptions really caught my interest. They provided a different viewpoint on some ideas that I’ve thought about a lot. But you don’t have to take my word for it… I strongly encourage you to set aside 30 minutes and enjoy reading this paper yourself.

Subscribe to get Grading For Growth posts delivered directly to your inbox, every Monday:

Peter Elbow was one of the early proponents of labor-based contracts, an essentially zero-level system that we described a few weeks ago.

Grading for Growth

Discussion about this post

Ready for more?