Throwback: Rigor
Seriously, what does that even mean? Anything?

Today we’re bringing you an update of one of our oldest posts, originally published on September 13, 2021. Back then, we brought these important thoughts on “rigor” to all 359 of our readers. This topic is just as important today as it was back then, and with more years of experience (and about 20 times as many readers), we thought it was time to give this post a refresh. We hope you enjoy it.
If you’re involved in academia in any way, you’ve heard the term rigor. It’s a constant companion in any discussion of grades, assessments, teaching methods, curriculum and course design, and of course, AI.
Let’s flip this intro, though: Before you read any farther, what does the word “rigor” mean to you?
Seriously: Take a minute. Actually write down what “rigor” means to you in a classroom setting, what it implies, maybe some examples of what you would consider rigorous or not. Then keep reading once you’re done.
What does rigor mean?
So, what did you write down?
After some extensive asking around, here are the words we most often heard used to describe rigorous courses: “difficult”, “challenging”, “strict”, “high standards”, “C average”, “bell curve”, “gatekeeping”.
If you’re not a fan of anecdata, here are some better-cited examples:
In Specifications Grading, a book that inspired many to rethink assessments, Linda Nilson uses rigor to mean “high academic standards”.
A collection of sentiments that many will recognize, wrapped up nicely by EdGlossary: “instruction, schoolwork, learning experiences, and educational expectations that are academically, intellectually, and personally challenging.”
In “Academic Rigor: A Comprehensive Definition” (written for Quality Matters, a nonprofit focused on measuring and guaranteeing course quality), Andria Foote Schwegler defines academic rigor as: “intentionally crafted and sequenced learning activities and interactions that are supported by research and provide students the opportunity to create and demonstrate their own understanding or interpretation of information and support it with evidence”.
Dictionaries cover a lot of ground with rigor. Ignoring non-academic meanings1, some relevant definitions include: “the quality of being extremely thorough, exhaustive, or accurate”; “strict precision”; “the quality of being unyielding or inflexible”; “scrupulous or inflexible accuracy or adherence”.
Robert even wrote about this way back in 2008: rigor is “thoroughness, carefulness, and right understanding of the material being learned”, and a rigorous course “examines details, insists on diligent and scrupulous study and performance, and doesn’t settle for a mild or informal contact with the key ideas”.
But by far the most common definition for rigor is: none. That is, most articles, books, and random internet conversations about rigor leave the term completely undefined. Rigor seems to be placed beyond definition, left up to the audience to interpret and recognize. You don’t need a definition for rigor — you know it when you see it, and it’s either there or it isn’t.
Or that’s what we’re told, at least. Authors and speakers often refer to the vaguely sinister specter of “lack of rigor” in modern education, a dog-whistle that covers all of the ground from kids-these-days have it too easy and I suffered back in my day and so you must too, to your course doesn’t cover my favorite pet topic.
This post was originally inspired by an article from Inside Higher Ed, “Upholding Rigor at Pandemic U”, that made the rounds in 2021. If the word “rigor” weren’t in the title, you might not realize that this is what the article is supposedly about; the word only appears twice outside the title, both times in the context of “upholding standards of rigor” and without clearly indicating what those standards are — or what rigor itself is. We are apparently supposed to simply know what rigor is, intuitively.2
There are so many more examples. Last year, we saw Harvard’s “Update on Grading” and the follow-up proposal to cap the number of A’s in each course at 20%. The update itself refers to “rigor” a few times without definition; news reports leaned in to undefined “rigor” even more heavily (see Forbes and the Harvard Crimson’s editorial, which interestingly also mentions “mastery-based grading”). As in the Harvard kerfuffle, “rigor” often appears paired with the bogeyman of “grade inflation”, with a large number of high grades being seen as evidence of lack of rigor.3 Rigor inevitably appears in any discussion of generative AI and how it affects grades and learning.
Here is, ultimately, the problem: Rigor is a wildly overloaded word. It means something different to each person, and even instructors with many shared educational values are likely to have different definitions. When two or more people are gathered to talk about rigor, there too shall be ambiguity. These discussions are inevitably surrounded by unexamined assumptions, biases, and cultural baggage.
What can we say?
So, what can we do about this overloaded word? Perhaps the simplest solution is best: Don’t use the word “rigor” at all. Too much is tied up in the word; it’s a red herring and distraction and a vehicle for our biases.
Instead, let’s pull apart some of the knotted threads that form “rigor” and see what we can say about them. In particular, what concrete things can we say about how alternative grading systems approach the issues that seem to be indirectly addressed by “rigor”?
In the rest of this article, we’ll take a look at what we can say about the academic standards in grading systems based on the Four Pillars.
Clearly defined standards with marks that indicate progress
When student work is evaluated against clearly defined standards, there’s something that doesn’t happen: Comparison to other students. Grading based on clearly defined standards is also called criterion-referenced assessment, which gives a clearer and more consistent meaning to grades. This meaning is linked to clear criteria and doesn’t vary depending on how other students perform (aka norm-referenced assessment).
In other words, these two pillars lead to grades that are more meaningful and directly reflect student learning.
Despite the words “rigor” and “standards” often showing up near each other (like in that phrase, “standards of rigor”), actual standards — criteria for what constitute acceptable work on a task, clearly spelled out and accessible to the student — aren’t often a part of traditional grading. Instead, in that context “standards” is often a proxy for “grade frequencies”. Did your class have too many A grades? Not rigorous enough. Was the distribution bell shaped with the mean around C? That’s more like it. To make this happen, instructors judge students against each other by “curving” grades, limiting the number of A’s, or using other wildly inequitable procedures that muddle the meaning of those grades.
Expecting student grades to fit a bell curve is simply not based in reality,4 and enforcing this through curving is the opposite of holding students to a high standard — it’s holding them to an arbitrary standard over which they have no control.
A reasonable objection here is that we haven’t actually said what or how high those standards actually are. Clearly, if we hold students to fluffy and light standards, like a meringue but with less academic meaning, then grading based on those standards isn’t going to fit anybody’s idea of rigor. And indeed, we want to make sure our courses challenge students intellectually and that a high grade is based on authentic evidence of real learning. Choosing standards, and determining what should be involved in meeting a standard, is a place where instructors can have productive discussion.
Ironically, when grading with standards, instructors often overcorrect and expect perfection from students. This inevitably involves aspects of a student’s work that are not central to the idea being assessed, possibly including things like arithmetic, copy errors, or writing style.
If a standard is clear, it must specify what matters and, implicitly or explicitly, what doesn’t matter. And in most cases, what matters is not “everything”. When a student is writing a solution to a math problem, misspelled words or poor punctuation might matter (if communication quality is part of the standard) — but probably not. As long as the solution is understandable, we can judge whether it meets the standard separately from whether it is well communicated. And we probably should.
But wait! If we’re leaving out some parts of a student’s work — such as not “removing points” for spelling or arithmetic errors — doesn’t that mean we’re lowering standards, decreasing rigor? Only if those things are part of the standards being assessed. But then, they should also be part of what is taught in the class. Things that matter should be spelled out clearly in the standards or specifications.
In the end, the clearly defined standards must necessarily ignore some aspects of student work. Standards should be clear about which items matter, and which don’t, and these will depend on the course in question. This can be a difficult, but essential, part of creating standards for your own classes. What matters in a lower-level course — for example, attention to numerical detail — may not be as important in a later course with a different focus.
Helpful feedback and reattempts without penalty
Alternative grading systems also critically feature helpful feedback and the ability to revise, resubmit, or reattempt without penalty. Those two pillars look like they reflect lower standards. If we give students lots of feedback, and then give them chances to reattempt their work without penalty, what’s to prevent every student from earning full credit on everything?
Nothing! And, that’s a good thing!5 These two pillars are the core of the feedback loop that makes learning work. In the end, helpful feedback and unpenalized reattempts lead to greater learning, as opposed to one-and-done assessments that incentivize students to focus only on the grade and flush away content from their brains once they’ve been tested.
This is best illustrated through the classic example of Alice and Bob, which shows the dangers of one-and-done assessments. Students who have opportunities to continue learning through a feedback loop are pushed and challenged to grow in their understanding. Those in more traditional systems are encouraged to accept partial understanding (and partial credit) in place of real, deep learning. Surely, pushing for greater understanding is more rigorous, isn’t it?
The goal of these two pillars is for grades to represent a student’s ultimate level of understanding. This reduces confounding factors, like whether a student was feeling ill on the day of a test, whether they were experiencing a personal crisis, whether the room contained distractions, and so on. This approach acknowledges that different people learn at different paces and can grow in their understanding. If we really believe this — if we really care about treating students like human beings who can succeed in our classes — then “high academic standards” must have room for feedback loops.
There’s an interesting consequence here: Many people who use alternative grading notice that the number of A’s and B’s in their class increases. As we mentioned above, “grade inflation” is often cited as a consequence of slipping academic standards and decreasing rigor. But here, it’s actually the opposite: By holding students to high standards and allowing reattempts without penalty, we remove aspects of grades that aren’t related to learning. Having only one chance to demonstrate understanding penalizes students for not performing on the instructor’s schedule. Partial credit doesn’t make up for this: It still fundamentally represents a one-and-done approach.
When we insist on concrete evidence of learning, every ounce of those high grades can be traced back to an explicit piece of work that meets high standards.6 When grading with feedback and reattempts, students are no longer permanently penalized when they fail an early test. Instead, if they work to improve their understanding, their grade fully reflects that they’ve achieved a high bar.
Instead of rigor…
We’re not saying that college courses shouldn’t be rigorous. We are saying that the word rigor itself has no inherent meaning and is therefore powerless to describe the kind of learning environment we want. In fact, we — David and Robert, and likely most others in on “Team Alternative Grading” — probably want the same kind of learning environment that many on “Team Traditional Grading” want: an environment where students are pushed and challenged to grow, engage deeply with difficult ideas, and show us in clear terms that they’ve met the challenge. In order to reach this goal, we have to start by using real words.
Rigor is not one of those. It is essentially a buzzword that just happened to appear in academic discourse decades ahead of all the other ones we currently deal with. So let’s stop using it.
If you’d like to see what word we do recommend, here’s Robert’s followup post from 2021: Moving on from “rigor”.
You don’t have to put your tongue too far into your cheek to see how some of the other definitions could apply: “A condition that makes life difficult, challenging, or uncomfortable”, “… often with copious sweating”.
That article coined the eye-rolling phrase “grace and compassion police” referring to those “who insist faculty shouldn’t demand very much from students”. There’s a lot of gatekeeping going on in that article.
While this post isn’t about grade inflation, it’s a closely related topic that we’ve written about at length before. See, for example, sections in these two posts: The heart of the loop: Reattempts without penalty and A media guide to ungrading.
I wrote much more about this in Abundance and Scarcity.
If you’re unsure about this, try saying “It’s a bad thing that all students have the chance to succeed in my class” out loud.
An interesting aside about increasing average grades: For our book we interviewed some instructors who use alternative grading at institutions that have experienced grade compression — that is, most grades are within a narrow band, typically A’s. They pointed out that instructors in this situation often feel pressure to compress grades for nonacademic reasons. But they also report that in their situation, alternative grading tends to “spread out” grades (decompression), reduces their average grades, but also students are happier with those grades. Their interpretation is that because alternative grading directly connects grades to student work, those students are more willing to accept a lower grade as representative of their actual work.


