Grading for Equity with Grading for Growth, part 2
Is alternative grading more resistant to bias?
This week, I’ll continue our discussion of how alternative grading systems that are based on our four pillars can result in more equitable grading practices. In particular, I’ve been working to show how our four pillars align with Joe Feldman’s three elements of equitable grading as described in his book, Grading for Equity. If you haven’t read last week’s installment, I highly recommend beginning there.
Feldman’s three elements of equitable grading are Accuracy, Motivation, and Bias-resistance. Last week, I addressed the first two of these elements. Along the way, I pointed out a few places where even a well-intentioned grading system can be prone to inequitable results.
This week, I’ll focus on the third of Feldman’s elements of equitable grading: bias-resistance. I’ll also dig much deeper into places where alternative grading can fall prey to the same biases that haunt traditional grading.
Bias-resistance
Feldman’s third element of equitable grading is bias-resistance. A key requirement for bias-resistant grading practices is that grades should be based on evidence of learning. Thus, grades should not include items that reflect a student’s behavior or environment, or which can be affected by an instructor’s personal bias. There’s a lot here, so let’s take it one piece at a time.
Grades should be rooted in student achievement with course content, as our first pillar–clear standards–requires. Let’s take a look at some ways that behavior, environment, and instructor bias can sneak into a traditional grading system, and see how our four pillars stand up against them.
Curves. By a “curve,” I mean assigning grades so that the fit some expected distribution, usually a normal (“bell”) curve. Doing this removes any link between a grade and actual learning. Enforcing a grade distribution reflects a student’s environment, since grades now depend largely on how others in the class perform. What does that have to do with an individual’s learning?1 Curves also encourage competition and distrust and reduce cooperation that can lead to even greater learning. Our four pillars are strong here: By measuring student work against specific standards, alternative grading systems avoid comparing students against each other.
Extra credit. This is one of Feldman’s big concerns, since extra credit is typically given for extra non-academic work. Thus, extra credit benefits students who have the time to complete that extra work: Students who don’t have to work in order to pay for school, who don’t have families to care for, or who otherwise enjoy extra time beyond the standard expectations of the course. In an alternatively graded class, extra credit is typically replaced by reassessments: Instead of earning points for non-academic work, students can improve their grade by demonstrating actual learning. Excellent, a win for the four pillars!
But… how reassessments are implemented can still be inequitable. To avoid disproportionately benefiting students who have extra time, alternative graders need to think carefully about workload. In particular, reassessments are part of a class’s workload, and must be included when thinking about how much work a student will be doing each week. (You do think about this — right? My usual advice is to estimate reassessments as taking half as much time as the original assessment.) This may require reducing other regular work! Simply adding reassessments on top of the normal set of assessments gives an undue advantage to the same students who have the time to do extra credit.
Late work. So far, our four pillars have done a pretty good job of focusing on learning rather than a student’s environment or behavior. But late work penalties are a place where alternative grading can go awry. Late work policies can be a significant source of bias and demotivate students, and our four pillars have nothing direct to say about this. There are many alternatively graded classes I’ve seen that require an excused absence to make up missed work (if late work is accepted at all). What counts as “excused”, and who decides it?
A student’s work might be late for any number of non-academic reasons: the need to work to pay for college, lack of reliable transportation, or the need to care for children or relatives, among many other possibilities. Penalizing a student’s grade due to not meeting an arbitrary deadline means (likely) penalizing them for those non-academic reasons. Each instructor needs to think carefully about how their late work policies align with the spirit and philosophy of alternative grading. In particular, think about our fourth pillar — reassessments without penalty — and how its purpose is to let students show what they’ve learned, whenever that may happen. How does that same philosophy apply to late work? This is why I often say “I’d rather see what you know than not get to see it at all” when a student asks if they can hand in work late.
Of course, this doesn’t mean that your class should be a deadlineless free-for-all (although some instructors do so, and make it work!). Structure is helpful; rigidity is not. It is possible to handle deadlines in a flexible manner that doesn’t involve grade penalties. Tokens are one way to do this: They give students a limited number of “free passes” for turning in work late, without excuse or penalty. When in doubt, err on the side of letting a student demonstrate their understanding rather than penalizing non-compliance.
Participation and attendance. Much like late work, participation (including attendance) is a non-academic factor that often appears in grades. Participation directly reflects student behavior rather than learning, and it is also highly susceptible to bias. Participation is generally viewed through an instructor’s unstated or unconscious expectations about what “participating” looks like. This tends to benefit students whose modes of participating are deemed “acceptable” by their instructor, often in ways that are aligned with the dominant culture (here, you can see that we’re sliding slowly from avoiding behavior and environment into examining an instructor’s own bias).
This is another place where instructors need to be careful in alternative grading. While our pillars insist that standards are based on course content, instructors often wish to incentivize behavior by counting attendance or participation in one of their grade categories. Feldman’s solution is to not count either of these, and I generally agree with that.
If participation must be assessed, then it is also something that must be taught and practiced with feedback, just like any other skill. Clear specifications can be really helpful here. Implicitly assuming that students already know what you mean by “participation” — and know how to help you see it — simply means that you’re assuming students have already learned what you want to teach them.
In general, participation and attendance are best incentivized by making class worth participating in, and recognizing that students can participate in many ways. If you absolutely want to assess participation, one of the best ways is to give students multiple paths to earn credit for participation, and to value student voices in deciding what “participation” means. Speaking up first is not the only way to be a valuable class citizen.
I want to take a moment to pick up a thread from that last point: If we want to assess something, we need to teach it too. In general, “course content” is anything that is explicitly taught, practiced, and assessed in the class. An instructor may wish to assess, for example, teamwork or study skills. These are excellent things for students to learn, but they are not course content if an instructor simply assumes a student knows all about them and assesses them without instruction or practice.2
With a focus on equity, instructors must take time to help students learn the relevant skills, let students practice (with feedback), and then assess those skills in the same way as any other course content.
Let’s also take a moment to talk about instructor bias directly. Bias-resistance requires instructors to build in ways to prevent their own personal, and often unconscious, biases from affecting grades. It’s the unconscious part that’s tricky here: We all have unconscious biases. Everyone, no exceptions. But if you don’t know you have a bias, how can you avoid it? One way is to avoid putting yourself in situations where bias can play a large role, for example by hiding student names (or other identifying information) when grading (if you’re grading through an LMS, many already offer this feature). Implementing consistent but flexible policies is also important: Instead of putting yourself in a position where you must judge whether a student’s absence is excused (putting your own expectations and biases potentially in play), you can allow students a limited number of no-questions-asked automatic extensions via tokens, or automatically post class materials on your course’s learning management system.
What about summative assessments?
One of Feldman’s recommendations about bias-resistant grades might be surprising: he insists that grades should be based only on summative assessments, not formative ones. Wait, what? Isn’t alternative grading founded on the idea that students should engage in a feedback loop, repeatedly engaging with feedback and reassessments until they deeply understand ideas? Is Feldman saying I can’t do that?
This needs some clarification: The “formative” assessment Feldman is talking about is traditional homework, assigned strictly for practice (although pre-class preparation from a flipped class can also fall into this category). When any assignment is meant strictly for practice, we should expect that students are still learning. If we grade practice work for correctness and include it in grade calculations, that puts a penalty on the learning process.
In contrast, summative assessments are meant to evaluate a student’s level of understanding after they have had a chance to practice. However, Feldman agrees that reassessments–retakes, redos, and revisions–are a key part of summative assessments, because they allow students to continue to grow and demonstrate understanding. Thus reassessments effectively allow an assessment to remain formative – not counting towards a student’s grade – until the student and instructor are satisfied with their progress, at which point a grade is (summatively) recorded.
This may seem subtle, but the issue of purely formative vs. summative is important to think about. The take-away for us is that instructors should identify the purpose of each assignment. Is it purely for practice or preparation, as with traditional homework and pre-class readings? If so, it should not be graded, or if it is, it should at most be counted for completion (not correctness). Only assignments that are meant to assess a student’s level of learning should count in grades. To be accurate, those grades should only be recorded after reassessment.
Whose understanding are we assessing?
There is one last aspect of bias-resistance that I want to address, and it’s a big one: To avoid building in bias, grades should represent a student’s own understanding, not that of others. Otherwise, we’re primarily benefiting students who have stronger social connections, and risk assessing their friends’ knowledge rather than their own.
To some extent, this is uncontroversial. We want grades to represent what a student actually knows! The only way Feldman directly addresses this is group grades for group projects: A “group grade” is a muddled representation of individual understanding. This is not unique to any particular assessment system. Group projects can have many benefits, so the answer isn’t to dispose of them. However, it’s essential to think carefully about how we are assessing individual student understanding when using group projects. Feldman recommends giving an individual assessment after a group project is completed. For example, you might assign group projects that are handed in for feedback, but only record grades after an individual revision and reflection on the project (this relates to the previous point about summative vs. formative assessments, too).
There is more to the issue of assessing individual understanding than just looking at group projects. In some ways, this is especially salient in higher education, where many summative assignments are completed outside of class. Essays, large-scale projects, portfolios, and even detailed calculations are better done when more time is available. Even for in-class quizzes or exams, reassessments may happen outside of class. In these cases, there’s the inevitable possibility that students get “help” from one another, and the amount of “help” available depends heavily on a student’s social network, background, and schedule. How does that relate to this element of equitable assessment?
First—I can’t emphasize this strongly enough—the solution is not to use only in-class timed assessments, or to use remote proctoring software. Timed assessments create huge amounts of stress. Remote proctoring in particular invades student privacy. Remote proctoring tools are highly bias-prone on their own: They produce inaccurate results, and some even have a hard time recognizing dark-skinned students, adding irrelevant and unnecessary barriers to even accessing assessments. Even beyond that, timed assessments are rarely the best way to assess student learning.
Robert addressed this issue of “help” head-on several months ago. We agree that instructors must keep an eye on where student grades are coming from. But we believe that, as far as the “help” issue is concerned in higher education, the most important issue is to ensure that all students have equitable access to helpful outside resources. We don’t want to remove or ban outside help, but rather to ensure that students have equitable access to help, in whatever form it may come.
For example: Study groups are often formed based on social connections. Leaving study groups to form organically advantages students with more social connections and fewer demands on their time. Students who live off-campus, work a full time job, or have family to care for won’t get the same advantages. But study group formation can be facilitated, for example by providing a time for students to form groups within class, encouraging online meetings, or working with a tutoring center.
Another aspect of “help” is access to instructor-provided help. For example, are your reassessments truly accessible to all students? If reassessments are limited to regular office hour meetings, then only students who can attend meetings at those times will be able to reassess. Other ways for instructors to provide equitable access to help includes posting slides or notes after class; recording lectures and making them freely available; and providing answer keys (with helpful notes on common issues) for appropriate assessments.
Finally, instructors can level the “help” playing field by following Linda Nilson’s advice in her book Specifications Grading: provide models for work that does and doesn’t meet expectations. This helps ensure that students have a common understanding of what “satisfactory” work involves, and doesn’t rely on the implicit assumption that students can come into class already knowing what “good enough” means. This advice applies to standards as well as specifications.
Next steps
My purpose in this series has been twofold: First, to show how the four pillars of alternative grading can support vastly more equitable outcomes for students, compared to traditional grading. But second, and equally important, I wanted to emphasize that equity isn’t something that happens by magic, and that you can’t achieve this magical equity just by implementing a new grading system. Instructors must be intentional about implementing more equitable assessment methods, being thoughtful about their students’ needs and their own biases. Feldman’s advice can help make this happen. But in the end, instructors need to be both reflective about their own practices, and alert for inequitable outcomes even within an alternatively graded class.
One final thought, prompted by Katie Mattaini (who is incredibly thoughtful about equity in education, who has hosted the “Equitable Assessment” session at the Grading Conference for the past two years, and who wrote about building meaningful student-instructor relationships earlier in the summer): Even in an otherwise carefully implemented alternative grading system, what we choose to assess can be inequitable, and this can outweigh all of the other choices we made. This happens when instructors implicitly choose to assess standards or specifications that unnecessarily privilege one particular group, idea, or background.
Just one brief example: What kind of writing is considered “satisfactory” in student work? Perhaps you’re teaching a particular style of writing (e.g. in a writing-focused class, or a technical communication class), in which case the writing style itself becomes part of the course content. But if not, it’s likely that you’re expecting students to communicate in a way that you’re familiar with. That’s probably the style of academic English, a style that has been forged and solidified over the centuries by primarily white, male, and Western academics. Is it necessary to assess that writing style, which isn’t a content focus of your class?
This comes back to an earlier idea: If something matters enough to be assessed, it also matters enough to take time to teach it, and give students a chance to practice with it. Perhaps you’ve identified that one of your course goals is to help students learn how to write a professional quality article in academic style, or to study a particular aspect of Biology or Chemistry or Philosophy. But it’s easy to unconsciously allow other aspects to slip in, things that aren’t actually part of what matters.
While doing interviews for our book, we’ve heard over and over that one of the hardest parts of creating an alternatively graded class is to identify and clearly state what matters for the purpose of assessment. When doing so, be on the lookout for whether things are slipping into your standards or specifications that aren’t necessary, and do implicitly privilege one particular group or way of knowing. What we assess is just as important as how we assess it.
Thanks for reading this short series on equity in alternative grading. What thoughts do you have about it? What did I miss? Let me know in the comments!
Hint: Nothing. I wrote about this previously, in the context of Benjamin Bloom’s wonderfully direct take-down of the entire idea of the normal curve in grading.
There’s a folk story that I have heard in several forms over the years. It goes more or less like this: a teacher put a picture of a skeleton on a bulletin board in their classroom, with every bone labeled. The instructor never mentions the skeleton in class (much less teaches the names or purposes of the bones), but it’s always there, visible to every student. At the end of the semester, the bulletin board is taken down and the teacher gives an exam that asks students to label the bones on a blank picture of a skeleton. The moral of the story is usually meant to be something like “you’re responsible for paying attention” or “just because I didn’t say it out loud, but only wrote it in the syllabus, doesn’t mean you’re not responsible for it” or some such thing. What students would likely take away from such a situation is that the teacher was a malicious trickster who was uninterested in helping students learn. Whatever the takeaway, this (hopefully apocryphal) teacher was most certainly not assessing actual course content and was definitely not thinking about how humans actually learn.
I appreciate the spirit of these past two articles, but the full realization of it seems impossible. Some examples:
(a) nowhere in my program or campus objectives do we require to communicate in English, but that is a tacit requirement in all of our courses
(b) in my digital design (logic circuit) class, it is difficult to imagine that a visually impaired student could be successful, primarily because the simulator software we use involves many small, multi-colored lines and symbols. Providing a student the alternative of, for example, drawing circuits larger by hand reduces one barrier but adds others such as increased time and the inability to run a simulation
(c) students with overbearing extracurricular commitments come to lab with more distractions and less mental energy than other students--they are at a disadvantage in accomplishing lab tasks in a 2-hour period (so offer them extra lab time, right? I do, but then we run into the issue of them not having the time to come in)
(d) "Structure is helpful; rigidity is not." But the end of the semester has a fixed date.
I really don't intend this as complaining. I intend it as "I want to improve my students' learning experiences, but need help getting there."
Do you consider this goal of equity as something that is achievable? Or more of an aspirational goal?
I love "Structure is helpful; rigidity is not." May I borrow it?
This is a very clear and helpful examination of how behaviors find their way into grades and how inequitable and inaccurate the resultant grades are.
Of course, part of the reason I approve is that what is written here is completely in line with the first six of my fifteen fixes for broken grades that have just been updated in a new edition of my "Repair Kit."