"The problem is this: Although we treat points like numbers and do statistics on them like numbers, points are best understood not as numerical data but as ordered labels. And therefore the statistics we perform on them make no sense."
I absolutely agree with you and follow your logic. I use learning progressions with delayed grading myself, at the high school level. However, I would like to see what's been done (if anything) on a bigger scale. Would you mind sharing any empirical studies that back up your claims? Thanks!
This is a great article and I am convinced by many of your arguments. Re arguments in favor of points-based grading, though, one thing that I think about a lot for my grading system is transparency--students should be able to compute their grade themselves at any point; other than very late-semester grades (e.g. final projects or, god forbid, exams) there should not be any suspense or surprise about their final grade. (I think this is important for giving students a greater sense of ownership over their learning and grades, and hopefully decreases perceptions that grades are arbitrary (which admittedly to a certain extent they are) or capricious (which hopefully they are not). Points-based grading makes this very easy; simply add up points according to a certain formula (which, in math classes, can even be reasonably complicated, or available by a public excel sheet or similar). I know not everyone values this sort of transparency to the extent that I do, but do you have any suggestions about how to maintain transparency in the absence of points (or arguments that it's worth sacrificing it)?
I consider the "transparency" of points to be a quasi-transparency. Suppose my grade comes from three tests, and I earned 78, 80, and 82. Sure, I can average those at any time and see that I have an average of 80 or B-. But this is not real transparency, because where did those points come from in the first place? What do they mean? What do I know, and what do I need to work on? Those point total don't say. All of the really *important* information is hidden, and no amount of statistical computation will change that.
So first of all, for those concerned about transparency, remind them of this point. Second, refer to the "Helpful Feedback" and "Marks Indicate Progress" parts of the Four Pillars. (For example https://gradingforgrowth.com/p/finding-common-ground-with-grading?s=w) By giving lots of helpful feedback and giving marks that indicate progress -- and by having it clearly stated in the syllabus what a person needs to do to earn a C, B, or A in the class (see for example https://gradingforgrowth.com/p/doing-alternative-grading-on-a-short?s=w) then this is actual transparency because we are hiding nothing.
Robert, I appreciate you making this argument as strongly as you can. I agree partially. Regarding your statement "There is simply no argument for using them other than inertia," what you're calling inertia might include the fact that, for a lot of us, it makes intuitive sense to give final grades as a weighted average of individual assignments. I think you're saying that we shouldn't average things that shouldn't be averaged, but we can often make the math work OK if we include tricks like dropping the lowest grade (to deal with that one 0 dragging everything else down). If a student has 3 big tests and gets scores of 65 (D), 75 (C), and 85 (B), it seems reasonable that their final grade should be in the 75/C range. Contrast that with three big units of a standards-based grading course where a student met all the standards for one or two units, but not the third unit. Is that B work, or C work, or ... ? Reasonable decisions can certainly be made, and justified, but that can feel more arbitrary and less intuitive than the weighted-average approach. Thus, to make a completely successful argument against points-based grading, you may need to confront the intuitive appeal of weighted averages even more directly than you did in this post.
Thanks Greg. I'll try to give a response that's as thoughtful as your comment. I'd like to look a little more closely at your examples.
In the "3 big tests" model, a student with a 65, 75, and 85 would average to a C. But so would a student with scores of 25, 100, and 100. Should the second student also get a C? In some ways it depends; if the tests are in any cumulative, the answer is a definite NO because despite the early flub, the student has clearly mastered the material by the end of the course; whereas the first one never really mastered *anything*.
If the tests are not cumulative, then I would question whether a single timed assessment provides accurate information about student mastery. There's no feedback loop in place; how do I know if perhaps the second student above was working to overcome some early lack of prerequisite skill, or was sick or just having a bad day?
Either way the "weighted average" approach to me seems incredibly *non*-intuitive. In fact it seems counterfactual to everything that I personally experience about learning. On the other hand, allowing students repeated opportunities to learn from mistakes and reattempt things, and basing their grade on whether they eventually show sufficient evidence of mastery, seems perfectly natural. This is in fact how all of us learn.
I *do* think that the weighted average approach is a lot easier for the instructor. I suspect that's why people stick with it, and this might be what you're calling "intuitive appeal".
I hope that doesn't come across as horribly snarky. I just think that inertia goes by many different names.
Apr 27, 2022·edited Apr 27, 2022Liked by Robert Talbert
Robert, thanks for this detailed response. It doesn't sound snarky to me! But let me try again on the intuitive appeal of averages. Before we grow up to be professors, a lot of us watch and/or participate in sports, and we are told that the best athletes are the ones with the highest batting averages, or the most points per game, or whatever. This is broadly analogous to the traditional points-based system. There are also situations analogous to a comprehensive final exam, like the Olympics or the World Series, where a performance at the big moment kind of supersedes everything else. Thus, I think there is a huge amount of subconscious conditioning from an early age that the way to judge quality or performance in general is to average performances over time and/or look to big high-stakes finales. Now, with all that said, we need to ask whether we should judge undergraduate learning in the way that we judge professional athletes, and you'd probably say no and I'd probably agree! But I think this previous, deeply embedded conditioning toward using averages or winner-take-all finals as indicators of quality is something to be conscious of and address directly -- as you have started to do.
Also, we should keep in mind that even in professional athletics, one-and-done is not the norm for many championship-level sports. Winners in the NBA playoffs have to win a best-of-7 series; soccer/football tournaments aren't decided by single games but on point aggregates across multiple matches; Olympic finalists in many events are decided by performing multiple times and keeping the top result.
Basically like any other course -- do a thorough evaluation of the learning objectives, the class activities used to enact them, and the assessments used to assess them and look for misalignments and places where the whole stack didn't lead to the intended learning outcomes. Using specs doesn't fundamentally change the process, only the data used to evaluate.
"The problem is this: Although we treat points like numbers and do statistics on them like numbers, points are best understood not as numerical data but as ordered labels. And therefore the statistics we perform on them make no sense."
I love this.
This is brilliant. I love it. Thank you for such a clear explanation of why we shouldn't use points (or %).
I absolutely agree with you and follow your logic. I use learning progressions with delayed grading myself, at the high school level. However, I would like to see what's been done (if anything) on a bigger scale. Would you mind sharing any empirical studies that back up your claims? Thanks!
This is a great article and I am convinced by many of your arguments. Re arguments in favor of points-based grading, though, one thing that I think about a lot for my grading system is transparency--students should be able to compute their grade themselves at any point; other than very late-semester grades (e.g. final projects or, god forbid, exams) there should not be any suspense or surprise about their final grade. (I think this is important for giving students a greater sense of ownership over their learning and grades, and hopefully decreases perceptions that grades are arbitrary (which admittedly to a certain extent they are) or capricious (which hopefully they are not). Points-based grading makes this very easy; simply add up points according to a certain formula (which, in math classes, can even be reasonably complicated, or available by a public excel sheet or similar). I know not everyone values this sort of transparency to the extent that I do, but do you have any suggestions about how to maintain transparency in the absence of points (or arguments that it's worth sacrificing it)?
I consider the "transparency" of points to be a quasi-transparency. Suppose my grade comes from three tests, and I earned 78, 80, and 82. Sure, I can average those at any time and see that I have an average of 80 or B-. But this is not real transparency, because where did those points come from in the first place? What do they mean? What do I know, and what do I need to work on? Those point total don't say. All of the really *important* information is hidden, and no amount of statistical computation will change that.
So first of all, for those concerned about transparency, remind them of this point. Second, refer to the "Helpful Feedback" and "Marks Indicate Progress" parts of the Four Pillars. (For example https://gradingforgrowth.com/p/finding-common-ground-with-grading?s=w) By giving lots of helpful feedback and giving marks that indicate progress -- and by having it clearly stated in the syllabus what a person needs to do to earn a C, B, or A in the class (see for example https://gradingforgrowth.com/p/doing-alternative-grading-on-a-short?s=w) then this is actual transparency because we are hiding nothing.
(Thanks for reading!)
The ordinal/scalar question is nuanced. See an analysis of "optimal" grade point weighting under certain assumptions here: http://highered.blogspot.com/2021/12/are-you-calculating-gpa-wrong.html
Robert, I appreciate you making this argument as strongly as you can. I agree partially. Regarding your statement "There is simply no argument for using them other than inertia," what you're calling inertia might include the fact that, for a lot of us, it makes intuitive sense to give final grades as a weighted average of individual assignments. I think you're saying that we shouldn't average things that shouldn't be averaged, but we can often make the math work OK if we include tricks like dropping the lowest grade (to deal with that one 0 dragging everything else down). If a student has 3 big tests and gets scores of 65 (D), 75 (C), and 85 (B), it seems reasonable that their final grade should be in the 75/C range. Contrast that with three big units of a standards-based grading course where a student met all the standards for one or two units, but not the third unit. Is that B work, or C work, or ... ? Reasonable decisions can certainly be made, and justified, but that can feel more arbitrary and less intuitive than the weighted-average approach. Thus, to make a completely successful argument against points-based grading, you may need to confront the intuitive appeal of weighted averages even more directly than you did in this post.
Thanks Greg. I'll try to give a response that's as thoughtful as your comment. I'd like to look a little more closely at your examples.
In the "3 big tests" model, a student with a 65, 75, and 85 would average to a C. But so would a student with scores of 25, 100, and 100. Should the second student also get a C? In some ways it depends; if the tests are in any cumulative, the answer is a definite NO because despite the early flub, the student has clearly mastered the material by the end of the course; whereas the first one never really mastered *anything*.
If the tests are not cumulative, then I would question whether a single timed assessment provides accurate information about student mastery. There's no feedback loop in place; how do I know if perhaps the second student above was working to overcome some early lack of prerequisite skill, or was sick or just having a bad day?
Either way the "weighted average" approach to me seems incredibly *non*-intuitive. In fact it seems counterfactual to everything that I personally experience about learning. On the other hand, allowing students repeated opportunities to learn from mistakes and reattempt things, and basing their grade on whether they eventually show sufficient evidence of mastery, seems perfectly natural. This is in fact how all of us learn.
I *do* think that the weighted average approach is a lot easier for the instructor. I suspect that's why people stick with it, and this might be what you're calling "intuitive appeal".
I hope that doesn't come across as horribly snarky. I just think that inertia goes by many different names.
Robert, thanks for this detailed response. It doesn't sound snarky to me! But let me try again on the intuitive appeal of averages. Before we grow up to be professors, a lot of us watch and/or participate in sports, and we are told that the best athletes are the ones with the highest batting averages, or the most points per game, or whatever. This is broadly analogous to the traditional points-based system. There are also situations analogous to a comprehensive final exam, like the Olympics or the World Series, where a performance at the big moment kind of supersedes everything else. Thus, I think there is a huge amount of subconscious conditioning from an early age that the way to judge quality or performance in general is to average performances over time and/or look to big high-stakes finales. Now, with all that said, we need to ask whether we should judge undergraduate learning in the way that we judge professional athletes, and you'd probably say no and I'd probably agree! But I think this previous, deeply embedded conditioning toward using averages or winner-take-all finals as indicators of quality is something to be conscious of and address directly -- as you have started to do.
Also, we should keep in mind that even in professional athletics, one-and-done is not the norm for many championship-level sports. Winners in the NBA playoffs have to win a best-of-7 series; soccer/football tournaments aren't decided by single games but on point aggregates across multiple matches; Olympic finalists in many events are decided by performing multiple times and keeping the top result.
How does one evaluate and improve a course that uses a specifications grading system?
Basically like any other course -- do a thorough evaluation of the learning objectives, the class activities used to enact them, and the assessments used to assess them and look for misalignments and places where the whole stack didn't lead to the intended learning outcomes. Using specs doesn't fundamentally change the process, only the data used to evaluate.