Excellent post - but I'd offer one pushback on "familiarity": just because it's familiar and default, that doesn't mean that students have necessarily bought into the system. There are lots of students who haven't fully bought in, even if they accept it as their default mode of assessment. I think many struggles that some students face are precisely because the assessment systems they are working under don't really make sense to them - but because they are default, they don't feel like they can question them. One advantage of non-traditional grading systems is their very defamiliarity - because the systems need to be learned and understood, they promote more overt and conscious buy-in, rather than default acceptance.
Loved this post as a place for thinking. Note that you've predicated it on hitting how traditional grading is or isn't transparent based on four components (familiarity, objectivity, predicatablity, and auditability). Primarily, you've used traditional grading examples and both given credit for the way traditional-points-based grading can be those things and also a few drawbacks or "myths" as you will, but not given quite the same attention to how alternative or ungrading methods can still meet these four components.
I found myself thinking about how ungrading, specifically, can be familiar - if we make it so, can be reasonably objective - as much as any other assessment (I am thinking of writing assessment here), and auditable - if we define the way progress will be monitored and then follow through. I left predictable for last because it's the one my students struggle the most to wrap their thinking around. They often voice concerns that without points to continually check, they don't know "how they are doing" or can't predict how they will do. However, once the buy-in happens, they realize ungrading actually provides THE MOST predictable outcomes. If students are working toward learning, hitting due dates and benchmarks/goals, and reflecting in ways that move them forward, they can reasonably predict success (in whatever way they define it). Because I use a grad pitch method, their predictions should hold true most all of the time. . . and in a few situations where that doesn't seem to be the case, it can usually be sorted out with a candid conversation.
I guess I'm saying there are ways to be transparent in all four of these sub-categories even without the use of traditional points, scores, and averages. Thanks for driving the conversations on better assessment.
I think you're right that predictability seems to be one of the stickiest points about alt grading with students. I've got some ideas on that, that might make for a post later. I agree with you that ungrading ought to be at maximum predictability, but the methods of prediction can be hard to conceptualize.
Here's what I take issue with: "By that point, though, I think you're so close to using alternative grading, something predicated on engagement with a feedback loop, that it would be less work to just go the whole way: Drop points altogether, let students do reattempts without penalty, and spend your time giving helpful feedback instead of allocating points (and adjudicating the complaints that arise from them)."
It seems like a key component of your definition of "alternative grading" is that you don't use points. Why?
I engage students in feedback loops including reattempts without penalty and, within the time limits I have, I give constructive feedback oriented towards helping students with subsequent attempts. I also use points as a way to indicate progress toward a learning target because I think it is transparent and lowers the cognitive load for students. I see this as being no different than counting how many standards a student has met. In many SBG courses I've seen, student grades are determined by how many standards they've met, i.e. 11/12 = A. That seems... very much like points to me.
Or maybe (as I think and write about this), your issue with points is DOING MATH with points, which is to say, not just counting how many standards a student has met, but averaging across different components of a course, like homework is 15% of the grade and exams are 85% of the grade. Is that your issue with points? Because that's not so different than assigning one standard as "did you do a significant portion [even sometimes a percentage, like 75%] of the homework," which would equate to some small percentage of the total grade (i.e. 1/12 = 8%), which... isn't that different than traditional grading.
Or maybe your issue with points is the ONE paper that says when points and feedback are both provided, students only look at points. Is there more evidence than just one paper that points are so harmful to student learning? I'd really like to see how students respond to points + feedback in an environment where they can reattempt the assessment.
Pretty much what David said. Don't take me literally when I say "drop points altogether". For example my current class (https://rtalbert.org/building-a-specifications-grading-course-part-1/) uses points on some items, namely our online practice homework. It's auto-graded by the software platform and grades each problem at 1 point. It's not possible to change this to verbal marks, and probably it would be better to leave it at points anyway, because in this case the point is just a "bit" (in computer terms), 1 = success and 0 = not-success. (Some multi-part problems have partial credit, which is annoying IMO but again there's nothing I can do about it.)
As you say, the main issue is doing math/statistics with points, which is nonsensical IMO because the points are not numerical data, but rather ordinal data dressed up to look quantitative. I mentioned averaging ZIP codes in the post, for instance. I do think there are some stats that make sense with ordinal data, for example tallying up the number of successful online homework problems a student does makes sense if you are using 1 = success and 0 = not. That's how I handle it, but that's about all I'm willing to do.
I believe there have been some followups to the Butler and Nisan paper, but I am not familiar enough with those to really have an opinion. I'm mostly going on my own experience when I say that feedback and points don't coexist particularly well together. That experience has been confirmed over and over for many years, and I have come to hate the corrosive influence that points have on the brains of my students, to the point where I really don't care so much if there is research supporting it. But my experience is not universal, and you might be able to pull off engaged student learning with points on things, where I have not.
In any event, far be it from me to come out here and say that you CAN'T do things in such-and-such a way, or MUST do things in a certain way, etc. Only a Sith deals in absolutes.
Not sure how the comments work here, but see my comment below on David's response! Curious to hear both of your thoughts on the value of averaging for repeated testing of a skill.
I think David's got it covered below. I would just say that the use of *numbers* as marks is fine by me, as long as the number indicates progress (David brought the "marks indicate progress" pillar in another response). For example in the post I mentioned my kids had SBG in their early grades and had numerical markers to indicate where they were in a continuum of progress along different standards. I think that's fine because it's just shorthand, in some ways similar to a Likert-type scale on a questionnaire. The problem comes when we mistake the numerical markers for quantitative data, which they are definitely not.
If you really wanted to test a student on their understanding of some topic over time and "take an average" then there are some qualitative approximations to this, like the median or mode in addition to the stuff David mentioned. However to make those data really reliable you would need to assess a student quite a few times, and that might become onerous for everybody. OTOH if it's a skill (like "backing up a claim with evidence" like you mentioned) that you have baked in to lots of assignments, then you could check off whether this happens each time, and then the student's "average" on that item could be the mode, i.e. does the student successfully back up claims more often than not.
However it seems to me in that case it would be simpler just to make "backing up claims with evidence" as part of the specifications for the assignment, ie. if they don't do it successfully then the assignment doesn't meet the standards yet and they need to revise.
Hi Jayme! I won't speak for Robert, but I have some thoughts on these items.
Our "four pillars" model doesn't *quite* say that points are right out, but as soon as you're evaluating work against standards or specifications, and focusing on a feedback loop, points make less sense. So we encourage people to use "marks" in the sense of progress indicators that help focus students on feedback, rather than doing calculations with points. (see e.g. https://gradingforgrowth.com/p/finding-common-ground-with-grading ). Some people definitely use points-like things with standards, e.g. the 0-4 scale that Robert mentioned for his daughter's school, but those are typically intended to be progress indicators rather than actual numbers. (I avoid using numbers as progress indicators because they are indeed too easy to confuse with points, even though their purposes are different.)
I especially agree that doing math with points -- especially averaging -- is a critical distinction between "using points" and doing what I think you're describing. A standard (or specifications) would refer to specific content or skills and the actions students can take to demonstrate them. Many systems do use a count or percentage of standards completed, although many others identify specific subsets of standards for different grades (e.g. "complete all CORE standards to earn an A..."). In any case, I have no objection to counting things, which is more mathematically valid averaging or curving.
I suppose the skill of "I keep up with homework, by completing at least 75% of it" *could* be a standard, but it's quite different from something like "I can identify and use an appropriate method for solving a quadratic equation".
In your last paragraph, I think you're referring to Butler & Nisan (1986), but I'm curious about calling it "ONE" paper. It's been replicated and extended a lot, in a lot of settings. For one higher-ed example, Lipnevich & Smith (2008): "Response to Assessment Feedback: The Effects of Grades, Praise, and Source of Information".
Thanks for your quick response, especially thanks for the Lipnevich & Smith reference. As for averaging... I really appreciated the averaging zip code analogy because of course averaging ordinal numbers doesn't make sense. But... one of the things Robert brings up in this post as an issue with SBG is the idea of checking off "student demonstrated mastery of X," which may apply for one moment in time, but may not be replicable if that student is tested again. In that sense, I'm thinking maybe testing students for the same standard repeatedly could be an appropriate use of averaging? One of the things I want my students to do *repeatedly* in my current Gen Bio course is use evidence to support a claim. If I ask them to do this on many assessment questions, wouldn't it be appropriate to average the score they got on these questions to get a sense of how consistently the student is able to do this skill?
(Although, now that I think about it, the danger here is averaging in poor performance earlier in the course before they learned the skill. But if there are no-penalty retest options, does that eliminate that concern? Do we want grading to be "ratchet-like" in the sense that once a standard is passed, it can't be "unpassed," or do we want the option for students to repeatedly demonstrate mastery of a skill in different contexts? Hmmm...)
I agree, the danger is averaging in early poor performance. There are other options that do exactly what you want! For example:
* Require students to master a skill a certain number of times (e.g. "three times this semester"). This is effectively a "keep the best attempts" approach like what you mention at the end -- it can't be "unpassed", but requiring a certain number of attempts helps ensure the student has kept the skill.
* Record a student's history on a skill, but keep only the 2 most recent attempts (and call the standard "met" if those are both satisfactory). This is a "show me you've eventually got it consistently" approach, but the possibility of losing credit due to one poor showing can make students skittish of even making attempts.
* Either of the above, combined with some sort of "recertification" late in the semester or on the final exam. The recertification can count towards meeting the standard, or maybe modify the final grade.
* Just take notes on a student's progress over time, and report something using your professional judgment, e.g. "novice", "progressing", "almost there", "consistently excellent".
Each of thees amounts to "counting attempts" without averaging. I usually use the first option -- permanent passing with multiple attempts -- sometimes with a recertification, or sometimes requiring one demonstration on each of several different types of assignments (e.g. quiz vs. project or homework). In my experience that is a good balance of simplicity, giving students chances to show what they know without penalty, and ensuring continued understanding. We also say a lot about options like these in our book. :)
(I have seen a lot of SBG systems that DO average scores on standards over time, many using a 4-point GPA-style scale. But I think what I described above achieves the same goal without some of the issues with averaging.)
Excellent post - but I'd offer one pushback on "familiarity": just because it's familiar and default, that doesn't mean that students have necessarily bought into the system. There are lots of students who haven't fully bought in, even if they accept it as their default mode of assessment. I think many struggles that some students face are precisely because the assessment systems they are working under don't really make sense to them - but because they are default, they don't feel like they can question them. One advantage of non-traditional grading systems is their very defamiliarity - because the systems need to be learned and understood, they promote more overt and conscious buy-in, rather than default acceptance.
Loved this post as a place for thinking. Note that you've predicated it on hitting how traditional grading is or isn't transparent based on four components (familiarity, objectivity, predicatablity, and auditability). Primarily, you've used traditional grading examples and both given credit for the way traditional-points-based grading can be those things and also a few drawbacks or "myths" as you will, but not given quite the same attention to how alternative or ungrading methods can still meet these four components.
I found myself thinking about how ungrading, specifically, can be familiar - if we make it so, can be reasonably objective - as much as any other assessment (I am thinking of writing assessment here), and auditable - if we define the way progress will be monitored and then follow through. I left predictable for last because it's the one my students struggle the most to wrap their thinking around. They often voice concerns that without points to continually check, they don't know "how they are doing" or can't predict how they will do. However, once the buy-in happens, they realize ungrading actually provides THE MOST predictable outcomes. If students are working toward learning, hitting due dates and benchmarks/goals, and reflecting in ways that move them forward, they can reasonably predict success (in whatever way they define it). Because I use a grad pitch method, their predictions should hold true most all of the time. . . and in a few situations where that doesn't seem to be the case, it can usually be sorted out with a candid conversation.
I guess I'm saying there are ways to be transparent in all four of these sub-categories even without the use of traditional points, scores, and averages. Thanks for driving the conversations on better assessment.
I think you're right that predictability seems to be one of the stickiest points about alt grading with students. I've got some ideas on that, that might make for a post later. I agree with you that ungrading ought to be at maximum predictability, but the methods of prediction can be hard to conceptualize.
I appreciate your thoughtful analysis, Robert!
Here's what I take issue with: "By that point, though, I think you're so close to using alternative grading, something predicated on engagement with a feedback loop, that it would be less work to just go the whole way: Drop points altogether, let students do reattempts without penalty, and spend your time giving helpful feedback instead of allocating points (and adjudicating the complaints that arise from them)."
It seems like a key component of your definition of "alternative grading" is that you don't use points. Why?
I engage students in feedback loops including reattempts without penalty and, within the time limits I have, I give constructive feedback oriented towards helping students with subsequent attempts. I also use points as a way to indicate progress toward a learning target because I think it is transparent and lowers the cognitive load for students. I see this as being no different than counting how many standards a student has met. In many SBG courses I've seen, student grades are determined by how many standards they've met, i.e. 11/12 = A. That seems... very much like points to me.
Or maybe (as I think and write about this), your issue with points is DOING MATH with points, which is to say, not just counting how many standards a student has met, but averaging across different components of a course, like homework is 15% of the grade and exams are 85% of the grade. Is that your issue with points? Because that's not so different than assigning one standard as "did you do a significant portion [even sometimes a percentage, like 75%] of the homework," which would equate to some small percentage of the total grade (i.e. 1/12 = 8%), which... isn't that different than traditional grading.
Or maybe your issue with points is the ONE paper that says when points and feedback are both provided, students only look at points. Is there more evidence than just one paper that points are so harmful to student learning? I'd really like to see how students respond to points + feedback in an environment where they can reattempt the assessment.
Curious to hear your thoughts!
Pretty much what David said. Don't take me literally when I say "drop points altogether". For example my current class (https://rtalbert.org/building-a-specifications-grading-course-part-1/) uses points on some items, namely our online practice homework. It's auto-graded by the software platform and grades each problem at 1 point. It's not possible to change this to verbal marks, and probably it would be better to leave it at points anyway, because in this case the point is just a "bit" (in computer terms), 1 = success and 0 = not-success. (Some multi-part problems have partial credit, which is annoying IMO but again there's nothing I can do about it.)
As you say, the main issue is doing math/statistics with points, which is nonsensical IMO because the points are not numerical data, but rather ordinal data dressed up to look quantitative. I mentioned averaging ZIP codes in the post, for instance. I do think there are some stats that make sense with ordinal data, for example tallying up the number of successful online homework problems a student does makes sense if you are using 1 = success and 0 = not. That's how I handle it, but that's about all I'm willing to do.
I believe there have been some followups to the Butler and Nisan paper, but I am not familiar enough with those to really have an opinion. I'm mostly going on my own experience when I say that feedback and points don't coexist particularly well together. That experience has been confirmed over and over for many years, and I have come to hate the corrosive influence that points have on the brains of my students, to the point where I really don't care so much if there is research supporting it. But my experience is not universal, and you might be able to pull off engaged student learning with points on things, where I have not.
In any event, far be it from me to come out here and say that you CAN'T do things in such-and-such a way, or MUST do things in a certain way, etc. Only a Sith deals in absolutes.
Not sure how the comments work here, but see my comment below on David's response! Curious to hear both of your thoughts on the value of averaging for repeated testing of a skill.
I think David's got it covered below. I would just say that the use of *numbers* as marks is fine by me, as long as the number indicates progress (David brought the "marks indicate progress" pillar in another response). For example in the post I mentioned my kids had SBG in their early grades and had numerical markers to indicate where they were in a continuum of progress along different standards. I think that's fine because it's just shorthand, in some ways similar to a Likert-type scale on a questionnaire. The problem comes when we mistake the numerical markers for quantitative data, which they are definitely not.
If you really wanted to test a student on their understanding of some topic over time and "take an average" then there are some qualitative approximations to this, like the median or mode in addition to the stuff David mentioned. However to make those data really reliable you would need to assess a student quite a few times, and that might become onerous for everybody. OTOH if it's a skill (like "backing up a claim with evidence" like you mentioned) that you have baked in to lots of assignments, then you could check off whether this happens each time, and then the student's "average" on that item could be the mode, i.e. does the student successfully back up claims more often than not.
However it seems to me in that case it would be simpler just to make "backing up claims with evidence" as part of the specifications for the assignment, ie. if they don't do it successfully then the assignment doesn't meet the standards yet and they need to revise.
Hi Jayme! I won't speak for Robert, but I have some thoughts on these items.
Our "four pillars" model doesn't *quite* say that points are right out, but as soon as you're evaluating work against standards or specifications, and focusing on a feedback loop, points make less sense. So we encourage people to use "marks" in the sense of progress indicators that help focus students on feedback, rather than doing calculations with points. (see e.g. https://gradingforgrowth.com/p/finding-common-ground-with-grading ). Some people definitely use points-like things with standards, e.g. the 0-4 scale that Robert mentioned for his daughter's school, but those are typically intended to be progress indicators rather than actual numbers. (I avoid using numbers as progress indicators because they are indeed too easy to confuse with points, even though their purposes are different.)
I especially agree that doing math with points -- especially averaging -- is a critical distinction between "using points" and doing what I think you're describing. A standard (or specifications) would refer to specific content or skills and the actions students can take to demonstrate them. Many systems do use a count or percentage of standards completed, although many others identify specific subsets of standards for different grades (e.g. "complete all CORE standards to earn an A..."). In any case, I have no objection to counting things, which is more mathematically valid averaging or curving.
I suppose the skill of "I keep up with homework, by completing at least 75% of it" *could* be a standard, but it's quite different from something like "I can identify and use an appropriate method for solving a quadratic equation".
In your last paragraph, I think you're referring to Butler & Nisan (1986), but I'm curious about calling it "ONE" paper. It's been replicated and extended a lot, in a lot of settings. For one higher-ed example, Lipnevich & Smith (2008): "Response to Assessment Feedback: The Effects of Grades, Praise, and Source of Information".
Thanks for your quick response, especially thanks for the Lipnevich & Smith reference. As for averaging... I really appreciated the averaging zip code analogy because of course averaging ordinal numbers doesn't make sense. But... one of the things Robert brings up in this post as an issue with SBG is the idea of checking off "student demonstrated mastery of X," which may apply for one moment in time, but may not be replicable if that student is tested again. In that sense, I'm thinking maybe testing students for the same standard repeatedly could be an appropriate use of averaging? One of the things I want my students to do *repeatedly* in my current Gen Bio course is use evidence to support a claim. If I ask them to do this on many assessment questions, wouldn't it be appropriate to average the score they got on these questions to get a sense of how consistently the student is able to do this skill?
(Although, now that I think about it, the danger here is averaging in poor performance earlier in the course before they learned the skill. But if there are no-penalty retest options, does that eliminate that concern? Do we want grading to be "ratchet-like" in the sense that once a standard is passed, it can't be "unpassed," or do we want the option for students to repeatedly demonstrate mastery of a skill in different contexts? Hmmm...)
I agree, the danger is averaging in early poor performance. There are other options that do exactly what you want! For example:
* Require students to master a skill a certain number of times (e.g. "three times this semester"). This is effectively a "keep the best attempts" approach like what you mention at the end -- it can't be "unpassed", but requiring a certain number of attempts helps ensure the student has kept the skill.
* Record a student's history on a skill, but keep only the 2 most recent attempts (and call the standard "met" if those are both satisfactory). This is a "show me you've eventually got it consistently" approach, but the possibility of losing credit due to one poor showing can make students skittish of even making attempts.
* Either of the above, combined with some sort of "recertification" late in the semester or on the final exam. The recertification can count towards meeting the standard, or maybe modify the final grade.
* Just take notes on a student's progress over time, and report something using your professional judgment, e.g. "novice", "progressing", "almost there", "consistently excellent".
Each of thees amounts to "counting attempts" without averaging. I usually use the first option -- permanent passing with multiple attempts -- sometimes with a recertification, or sometimes requiring one demonstration on each of several different types of assignments (e.g. quiz vs. project or homework). In my experience that is a good balance of simplicity, giving students chances to show what they know without penalty, and ensuring continued understanding. We also say a lot about options like these in our book. :)
(I have seen a lot of SBG systems that DO average scores on standards over time, many using a 4-point GPA-style scale. But I think what I described above achieves the same goal without some of the issues with averaging.)
This list of options is great. Can't wait to read your book!!