When is a number not a number?

Apr 25, 2022

A statistical case against points-based grading

16 Comments

Apr 26, 2022

"The problem is this: Although we treat points like numbers and do statistics on them like numbers, points are best understood not as numerical data but as ordered labels. And therefore the statistics we perform on them make no sense."

I love this.

Expand full comment

Kenoc

Apr 26, 2022

This is brilliant. I love it. Thank you for such a clear explanation of why we shouldn't use points (or %).

Expand full comment

Ryan Tracey

Jul 31

Thanks for the insightful explanation, Robert. Just to check my understanding, are Alice and Bob's 100-point exams marked subjectively? Hence their scores are ordinal categorical data and averaging them doesn't make sense.

Expand full comment

Reply (1)

Robert Talbert

Yes -- these exams are not something like 50 true/false questions that are worth 2 points each. (Even then, I would have issues.)

Expand full comment

Reply (1)

Ryan Tracey

Thanks for clarifying. Would one of those issues with a T/F exam be the same problem regarding averaging Alice's three exams?

Expand full comment

Reply (1)

Robert Talbert

The issues would be in the assumptions made about the validity of the exam. Why are the questions on the exam, on the exam? Does a high percentage of correct objective responses on the exam signify adequate evidence of learning? Did all students have access to the objectives that were being tested (or was it a matter of guessing what was going to be tested)? And so on.

Expand full comment

Elise Naramore

Sep 2

I absolutely agree with you and follow your logic. I use learning progressions with delayed grading myself, at the high school level. However, I would like to see what's been done (if anything) on a bigger scale. Would you mind sharing any empirical studies that back up your claims? Thanks!

Expand full comment

Avi Zeff

May 18, 2022

This is a great article and I am convinced by many of your arguments. Re arguments in favor of points-based grading, though, one thing that I think about a lot for my grading system is transparency--students should be able to compute their grade themselves at any point; other than very late-semester grades (e.g. final projects or, god forbid, exams) there should not be any suspense or surprise about their final grade. (I think this is important for giving students a greater sense of ownership over their learning and grades, and hopefully decreases perceptions that grades are arbitrary (which admittedly to a certain extent they are) or capricious (which hopefully they are not). Points-based grading makes this very easy; simply add up points according to a certain formula (which, in math classes, can even be reasonably complicated, or available by a public excel sheet or similar). I know not everyone values this sort of transparency to the extent that I do, but do you have any suggestions about how to maintain transparency in the absence of points (or arguments that it's worth sacrificing it)?

Expand full comment

Reply (1)

Robert Talbert

May 18, 2022Edited

I consider the "transparency" of points to be a quasi-transparency. Suppose my grade comes from three tests, and I earned 78, 80, and 82. Sure, I can average those at any time and see that I have an average of 80 or B-. But this is not real transparency, because where did those points come from in the first place? What do they mean? What do I know, and what do I need to work on? Those point total don't say. All of the really *important* information is hidden, and no amount of statistical computation will change that.

So first of all, for those concerned about transparency, remind them of this point. Second, refer to the "Helpful Feedback" and "Marks Indicate Progress" parts of the Four Pillars. (For example https://gradingforgrowth.com/p/finding-common-ground-with-grading?s=w) By giving lots of helpful feedback and giving marks that indicate progress -- and by having it clearly stated in the syllabus what a person needs to do to earn a C, B, or A in the class (see for example https://gradingforgrowth.com/p/doing-alternative-grading-on-a-short?s=w) then this is actual transparency because we are hiding nothing.

(Thanks for reading!)

Expand full comment

David Eubanks

May 1, 2022

The ordinal/scalar question is nuanced. See an analysis of "optimal" grade point weighting under certain assumptions here: http://highered.blogspot.com/2021/12/are-you-calculating-gpa-wrong.html

Expand full comment

Greg Crowther

Apr 26, 2022

Robert, I appreciate you making this argument as strongly as you can. I agree partially. Regarding your statement "There is simply no argument for using them other than inertia," what you're calling inertia might include the fact that, for a lot of us, it makes intuitive sense to give final grades as a weighted average of individual assignments. I think you're saying that we shouldn't average things that shouldn't be averaged, but we can often make the math work OK if we include tricks like dropping the lowest grade (to deal with that one 0 dragging everything else down). If a student has 3 big tests and gets scores of 65 (D), 75 (C), and 85 (B), it seems reasonable that their final grade should be in the 75/C range. Contrast that with three big units of a standards-based grading course where a student met all the standards for one or two units, but not the third unit. Is that B work, or C work, or ... ? Reasonable decisions can certainly be made, and justified, but that can feel more arbitrary and less intuitive than the weighted-average approach. Thus, to make a completely successful argument against points-based grading, you may need to confront the intuitive appeal of weighted averages even more directly than you did in this post.

Expand full comment

Reply (1)

Robert Talbert

Apr 26, 2022

Thanks Greg. I'll try to give a response that's as thoughtful as your comment. I'd like to look a little more closely at your examples.

In the "3 big tests" model, a student with a 65, 75, and 85 would average to a C. But so would a student with scores of 25, 100, and 100. Should the second student also get a C? In some ways it depends; if the tests are in any cumulative, the answer is a definite NO because despite the early flub, the student has clearly mastered the material by the end of the course; whereas the first one never really mastered *anything*.

If the tests are not cumulative, then I would question whether a single timed assessment provides accurate information about student mastery. There's no feedback loop in place; how do I know if perhaps the second student above was working to overcome some early lack of prerequisite skill, or was sick or just having a bad day?

Either way the "weighted average" approach to me seems incredibly *non*-intuitive. In fact it seems counterfactual to everything that I personally experience about learning. On the other hand, allowing students repeated opportunities to learn from mistakes and reattempt things, and basing their grade on whether they eventually show sufficient evidence of mastery, seems perfectly natural. This is in fact how all of us learn.

I *do* think that the weighted average approach is a lot easier for the instructor. I suspect that's why people stick with it, and this might be what you're calling "intuitive appeal".

I hope that doesn't come across as horribly snarky. I just think that inertia goes by many different names.

Expand full comment

Reply (1)

Greg Crowther

Apr 27, 2022Edited

Robert, thanks for this detailed response. It doesn't sound snarky to me! But let me try again on the intuitive appeal of averages. Before we grow up to be professors, a lot of us watch and/or participate in sports, and we are told that the best athletes are the ones with the highest batting averages, or the most points per game, or whatever. This is broadly analogous to the traditional points-based system. There are also situations analogous to a comprehensive final exam, like the Olympics or the World Series, where a performance at the big moment kind of supersedes everything else. Thus, I think there is a huge amount of subconscious conditioning from an early age that the way to judge quality or performance in general is to average performances over time and/or look to big high-stakes finales. Now, with all that said, we need to ask whether we should judge undergraduate learning in the way that we judge professional athletes, and you'd probably say no and I'd probably agree! But I think this previous, deeply embedded conditioning toward using averages or winner-take-all finals as indicators of quality is something to be conscious of and address directly -- as you have started to do.

Expand full comment

Reply (1)

Robert Talbert

Apr 28, 2022

Also, we should keep in mind that even in professional athletics, one-and-done is not the norm for many championship-level sports. Winners in the NBA playoffs have to win a best-of-7 series; soccer/football tournaments aren't decided by single games but on point aggregates across multiple matches; Olympic finalists in many events are decided by performing multiple times and keeping the top result.

Expand full comment

Apr 26, 2022

How does one evaluate and improve a course that uses a specifications grading system?

Expand full comment

Reply (1)

Robert Talbert

Apr 26, 2022

Basically like any other course -- do a thorough evaluation of the learning objectives, the class activities used to enact them, and the assessments used to assess them and look for misalignments and places where the whole stack didn't lead to the intended learning outcomes. Using specs doesn't fundamentally change the process, only the data used to evaluate.

Expand full comment

Grading for Growth

When is a number not a number?