Finding common ground with grading systems

We've seen the differences. Now what are the similarities?

Aug 30, 2021

As David and I write and engage with others about grading, there’s definitely a sense that the time is coming, and maybe is already here, for a wholesale change in how we grade in higher education. When David wrote last week about the profusion of alternative grading techniques that are out there, I think the sheer variety signifies a deep and widespread desire to make this change. People are realizing that reforming assessment and grading can have outsized results in improving higher education as a whole. It’s one of those places where 20% of the effort will produce 80% of the results.

But the variety can also be overwhelming. Instructors might say, I want to change my grading practice, but should I go with specifications grading? Standards-based grading? Ungrading? Contract grading? Most real-life approaches to alternative grading don’t fit neatly into any of those boxes, and often none of these general categories will be a perfect fit to your students in your classes. And how are we supposed to keep up with all these terms? Do you have to be an expert even to get started?

It seems smarter to focus on the overall ideas that unify these different approaches. So this week, rather than introduce another kind of grading practice, we’re going to pull back to a higher altitude and try to distill what all these ideas have in common and come up with a general framework for these practices. Not a “definition” of anything — there’s still too many idiosyncrasies and varied practices to hope for something that’s both precise and general — but instead a map, with room for interpretation, that stakes out some of the common ground that we seem to be walking together.

Common ground

Despite the differences in the ways that all these grading practices are worked out in real classrooms, what do they seem to have in common? Here’s what I see:

Student work is evaluated against clearly defined and context-appropriate standards for what constitutes “acceptable work”. In other words, the systems are rooted in students knowing what acceptable work looks like, using standards that are professionally appropriate but scaled to the level of the student. Standards-based grading and specifications grading are obviously built on this principle (just look at the names). Ungrading advocates might disagree (see Alfie Kohn’s famous essay “The Trouble with Rubrics”). But even when ungrading, although you might not use a concrete rubric, you are still making decisions about whether student work is “good enough” or not. Presumably those decisions aren’t just made by “gut feel” (which is one way of saying “personal bias”) but through standards that you, as a content expert, believe are appropriate for determining quality. In other words, we’re all using standards. Ethics and common decency would say we should externalize those and be up-front with students about it, and so that’s part of the system.
Student work, when evaluated, is given helpful, actionable feedback that the student can and should use to learn and improve their work. Feedback is the beating heart of all of these practices. Traditional grading looks at student work, assigns a number or a letter to it — and that’s all. It gives student work the silent treatment. In all these alternative practices, instead, the students’ work opens up a conversation and initiates a feedback loop.
Student work doesn’t have to receive a mark, but if it does, the mark is a progress indicator and not an arbitrary number. The alternative practices we’ve mentioned here all share the realization that marks, if given, are just at-a-glance summaries of what the feedback says — nothing more. They are there primarily for convenience and for entry into a gradebook. In particular, these grading practices do not pretend that numbers assigned to student work (75%, 8/10, etc.) are numerical data. They are not. They are categorical data disguised in numerical form, like zip codes, and the statistical contortions used by traditional grading to convert those numbers into letter grades are fundamentally irrelevant and merely give the illusion of objectivity. (“Objectivity theater” is how it’s been described.) It would probably be better to dispense with marks altogether, as ungrading typically does, given their tendency to distract and demotivate students. But if we must put marks in a gradebook, they should be informative. They should be informative categorical data rather than fake numerical data.
Students can revise, resubmit, or reattempt work without penalty, using the feedback they receive, until the standards are met or exceeded. All of these alternative frameworks are predicated on feedback loops. This seems to be their defining and essential ingredient. They don’t only have clear and appropriate standards and regular streams of feedback: They also allow students to combine their work, the standards, and the feedback and then try again. It’s in the trying again that grading turns into growth. And we don’t penalize this, because what kind of person penalizes growth?

Not a definition

There is a temptation at this point to look to the four observations I’ve just made and turn them into a definition of a general category of grading, with a special name, of which SBG, specifications grading, etc. are all instances. (David and I are mathematicians, after all — abstraction is what we do.) But I am going to resist that temptation, and I think you should too, for two reasons.

First, definitions are exclusionary by nature. When you define a thing, you draw a line between instances of that thing and non-instances of it, and the “canonical” instances tend to receive pride of place. This is OK in some situations (e.g. defining terms in mathematics so you can meaningfully prove theorems about them) but in other situations, especially education, it tends to be highly counterproductive because it locks people out unnecessarily. If you’re thinking of instituting a grading system that involves a lot of feedback and revision, but for whatever reason you still want to assign points to things, you shouldn’t feel left out of this conversation or pressured to do things a different way because a definition said so. If you’re an ungrader and feel that some of the observations above don’t quite fit what you’re trying to accomplish, you should still feel welcome at the table and able to have a real conversation about student success with someone who does specifications grading.

Second, definitions of educational ideas in my experience tend to derail people’s focus. I learned this when writing my flipped learning book. Flipped learning at the time needed an operational definition that made it possible for people to do research about it, and made it OK for instructors not to use video1. So I came up with one; but a lot of faculty stopped asking good questions about flipped learning (What’s the best way to use class time if I’m not lecturing?) and instead focused on whether what they were doing was “real” flipped learning or not. So rather than give a definition of “Proficiency Grading” or “Awesome Grading” or whatever you might want to call it2, let’s just not, for now, and focus instead on how best to do whatever it is we are describing here.

Four Pillars (beta version)

So we are setting up a big tent with a lot of room underneath for anybody who wants to think about the sort of grading approaches being described here. Stealing shamelessly from our friends in the IBL community (specifically the “pillars of IBL teaching”) I’d like to close here by visualizing this “tent” as a building with four pillars.

(A graphic designer I am not.) As advertised, this is a beta version, not in any way guaranteed to be complete or even correct. In fact David has already informed me that I need to work on this some more. (I mean, are those pillars even touching the pediment? What kind of physics are we using here? — DC) But that’s what the comment section is for, and anyway I think it’s more useful than a definition of a term.

In fact what I hope, is that in the near future, what we’re describing here won’t need a special term — it will just be “grading”, and grading using these practices will be so normative that it’s the departures from these practices that will need special terminology3.

More details

Since we first wrote this post, we have written more detailed posts diving into the details of each pillar. Check them out!

Pillar 1: How to write standards (and a follow-up: What does it mean to meet a standard?)
Pillar 2: The care and feeding of helpful feedback
Pillar 3: Giving marks that indicate progress (follow-up: More reasons to avoid using numbers for grades)
Pillar 4: The heart of the feedback loop: Reattempts without penalty

Click here to receive Grading for Growth in your inbox, every Monday.

Every definition of flipped learning up to that point had stated that students must watch videos prior to class. This was even used as one of the exclusion criteria in one of the most cited early research reviews on flipped learning at the time. It was a dumb criterion to have, so I fought back with my own definition. Fortunately I don’t think grading suffers from that kind of issue for now. But more precise definitions might be necessary in the future for research purposes; we’ll see.

In fact, it has been called something before: Mastery grading, or sometimes “mastery-based grading”. There are several issues with this term, none of which I am going to discuss here and now. The point is to focus, for now, on the thing itself rather than what the thing is called.

For a long time I’ve said the same thing about flipped classrooms (“Eventually we’ll just call it ‘the classroom’”) but Sharona Krinksy, our friend and the main driver of the annual Grading Conference, is the one who’s said this the most about grading.

Still lighting learning fires

Mar 14

Really enjoy following your thinking. It is similar to the journey I've been on since I started teaching. I realized early on that both grading and grades were at best false proxies. I've been lucky enough to be part of the world languages community with scores of dedicated, innovative colleagues. Our discipline was fortunate because we are small enough to try things without attracting too much blowback as we go. That allowed us to codify a set of standards (your first pillar, which led to adoption of an accepted proficiency scale, and we've had assessment leaders who have developed a variety of valid, reliable assessments (3rd pillar) that can measure an individual's proficiency on a scale from Novice low (beginning language learner of any age) to Distinguished (equivalent of a native speaker with an advanced degree). An individual's age, number of years of study, type of program, language being assessed etc, are not factors -- it comes down to what they know and can do in/with the language. My work has led me to focus on 3 essentials that are similar to your four pillars. The first is clearly defined standards of proficiency (not to be confused with performance). Teachers and students both know what it means to function at the Intermediate Mid level of proficiency. No secrets about what you're trying to accomplish. Second is an external, validated assessment. This also means that a learner in Connecticut or Oregon will be held to the same standard as a learner in Mississippi or Nebraska or California. Someone assessed as Novice High in Delaware has the same use of the language as someone at Novice High in North Dakota. No teacher bias. Third is Proficiency-based credits. It's quite simple -- when your proficiency is assessed at the level agreed upon in your school/district, you get the credit. Not before and no need to wait until the end of a term to keep moving on. No sliding by with a B- or a D and getting further behind as you go. It also puts control over the credit in the student's possession. Want to spend more time and progress more rapidly? It'll pay off. Spend time on your language learning during a vacation? It now has real value. Need to slow down while you focus on other things? No problem (but you won't get a credit just because the calendar says it's the end of the term). This also fundamentally changes the teacher/student relationship. The teacher is not playing "gotcha". In fact, mistakes (which the research shows are fundamentally essential in the learning process) aren't going to "cost you" as a student. The teacher's ONLY goal is to help the learner progress toward the proficiency level and at the rate that the student has set as a goal. Those 3 elements are transformative. 1. Clearly defined and accepted standards (beyond a single teacher or classroom), 2. A valid, reliable external assessment instrument (we have several to choose from) 3. Proficiency-based credits to give true value to student effort. Of course there are other essentials but it's exciting because we are seeing this actually work in classrooms. I love seeing the way this is developing in other disciplines like yours!

Expand full comment

Josh Green

Apr 12, 2023

Thank you for sharing this informative article on grading for growth. It's important to prioritize student learning over grades, and finding common ground with grading can help facilitate this.

If you're interested in learning more about the Pareto principle of time management, I recommend checking out this article https://productive.fish/blog/pareto-principle/. It explains the concept in detail and offers practical tips on how to apply it to various aspects of life, including grading practices.

Grading for Growth