When alternative grading goes (almost) wrong

A story about different expectations and how to gently handle pushback.

Sep 30, 2024

mountain pass during sunrise — Photo by Matt Howard on Unsplash

Today’s guest article is from Giulia Toti. She is Assistant Professor of Teaching in the Department of Computer Science at the University of British Columbia in Vancouver. She was previously an Instructional Assistant Professor at the University of Houston, where she also got her PhD in Computer Science (specifically in Machine Learning) and developed an interest in alternative grading practices as a way to improve students’ preparedness and to increase fairness in grading. At UBC, she teaches introductory programming courses, as well as Machine Learning courses and courses with an emphasis on ethical CS and AI application.

When I started looking into different ways to assess my students and give them feedback 4-5 years ago, I was not aware that alternative grading was “a thing”. I was simply trying to solve a problem in one of my introductory programming classes: the students were not performing at the level I deemed sufficient to comfortably approach the later (and harder) programming courses, and were at a risk of leaving with large gaps in their knowledge, because they had no incentives to review material they did not understand well (why would they, if they already got a grade on it and they could do nothing to improve it?). I ended up changing the course completely, and only later found out that the specific strategy I adopted had a name: Mastery Learning. After this first step, which you can read about in this paper, I was sold. Alternative grading strategies were, if not a solution to all problems, a key to shift students’ attention toward what really mattered (content, not grades), encourage them to reach higher standards and not settle for “good enough to pass”, and be truly fair to all students, reducing the impact of external circumstances on their course performance.

After this first experiment, I kept looking for ways to adopt and promote alternative grading strategies in other courses, and, in 2023, I had a new opportunity in a course titled “Fairness, Accuracy, Transparency and Ethics (FATE) in Data Science”, which I created and delivered for its first offering. Despite some good planning, my experience with alternative grading in this course was different from anything I experienced before: it got quite close to failing, and it made me think about backpedaling to a more traditional grading strategy. In this post, I will talk about how I walked myself back from the brink of a grading disaster and turned this course into another cherished success story.

A bit of context

The Computer Science department is one of the largest in the Faculty of Science at the University of British Columbia-Vancouver, enrolling almost 3000 undergraduate majors and employing almost 80 faculty members. It is also one of the highest-rated computer science programs in Canada, consistently tying for first place according to the Maclean's annual survey.

The course was designed by myself and another instructor, and I was to deliver it for the first time in Fall 2023 to a cohort of 36 students. I obviously thought that the content of the course was extremely interesting and relevant, touching on several issues related to the increased use of data science and AI that we are experiencing in our society (privacy and data collection, biased algorithms, societal and environmental impact, and so on). I created most of the lecture, assignment and assessment material for this course, and I designed the syllabus and the grading scheme. I decided to opt for a hands-on approach for class meetings, where students would spend less time listening to lectures and more time working on assignments. I also included in the grading scheme several elements of alternative grading, with particular attention to resubmission opportunities and a rich feedback loop.

I divided the course content into six modules, to be delivered bi-weekly. Each module included a short lecture, an assignment, and a quiz. Assignments were designed to be mostly formative in nature, with opportunities for the students to learn and experiment on some concepts on their own. Students were expected to submit their assignments by the end of the second week, after which they were graded at 3 levels of proficiency: “Needs improvement”, “Good”, and “Excellent”. Graded assignments were returned about a week after submission with a good amount of written feedback on what needed to be improved. A resubmission was possible with each new assignment submission. In other words, for each module, students could submit the current assignment, and one (only one) of the previous assignments they wished to have regraded. Quizzes were 50 minutes long and graded on a scale from 1-10. Some quiz resubmission opportunities were also available: on the last day of class, students were allowed to retake up to two quizzes, keeping the highest grade between the two attempts; this “make-up day” would also cover for missed quizzes.

A square peg in a round hole

Despite my initial efforts and enthusiasm, and ignoring some procedural challenges (e.g. how to organize the resubmissions), I quickly found myself dealing with limitations. For starters, the Learning Management System in use at my institution required me to choose how to report the final grade of an assignment as either a letter grade, a percentage, points, or complete/incomplete. While some customization was possible, neither of these options agreed exactly with what I was trying to do with the assignments, where I would have liked students to receive one of the three proficiency labels.

Additionally, the grading system at my institution requires instructors to submit, at the end of the course, not just a letter but a percentage grade, while I would have very much liked to assign more broadly defined levels of proficiency (like in Specifications Grading). I considered assigning only a fixed percentage grade for each letter range, but, as a junior faculty with no knowledge about any other course in my department adopting such a strategy, I was afraid of raising some eyebrows if I turned in only a finite set of percentages at the end of the course.

Furthermore, assigning the right percentage value to each letter was not trivial. Consider, for example, the A+ range, which at UBC is anything above 901: should I assign to every student at this proficiency level a percentage grade of 100? That would be unreasonably high. How about 95? Then students would complain that in this course it is impossible to get 100. Therefore, I needed to adopt a grading system that, in the end, would allow me to compute percentage grades. I opted for assigning a different amount of points to the assignments proficiency levels: 0 for “Needs improvement”, 40 for “Good”, and 60 for “Excellent” (thus answering the question of what to enter as a grade in the LMS). I also scaled the quiz points by 4, so that the maximum number of points possible in a quiz was 40 (although they were still being reported as 1-10). These tweaks allowed me to have a score between 1 and 100 for each module, that I could average and report for each student at the end of the course.

The course took off. The lectures went on without a hitch. The students seemed to find the topics interesting and were very engaged in the conversation. The quizzes were also not a prevalent source of concern; most students received a score in the top half of the range, and were comforted by the idea that they had the chance to improve two quizzes that they had already taken. The assignments, however, created some attrition, as more than half of the students saw their first attempt returned with a score of 0 for “Needs Improvement”. I was not worried by this; I intended for students to review their work and improve on some key concepts. I saw “Needs Improvement” submissions simply as chances to fix mistakes and improve mastery. What students saw, instead, was the sight of zeros accumulating on their grades portal, something they most likely never had to experience before. As resubmissions took time (two weeks to resubmit, another week to receive the new grade), these zeros were hanging over them for uncomfortably long. At the midpoint of the semester, most students had only a few “Good” submissions and no “Excellent”, and I could see them getting nervous. Typically, half way through a course, their grade was building up and settling around a value that would help them assess their performance. But here, they had very little to work with. Sensing their discomfort, I ran a midterm survey to gather their opinions and start a conversation. Here are some of the comments I got:

“The most difficult thing in this course is trying to understand the impossible rubric to stop losing marks”
“I have to aim for that "40", because otherwise my mark will be ended up with 0”
“I strongly implore the teaching team to seriously consider adopting a percentage-based grading rubric for assignments”
“I am afraid of getting a horrible grade in the course”

Clearly, the students and I were living very different experiences. I knew that they were doing just fine and that they only needed some improvement here and there to not only pass the course, but also get a good grade on their transcript. They, on the other hand, were starting to believe that the standards of proficiency I defined were impossible to meet and were asking to revert to a more fine-grained grading scheme that would allow us to bring home at least some points. They never made it to the midpoint of a course having accumulated only a very small portion of the available points. They thought they were failing.

“Trust me”

After seeing the midterm survey results, I felt very sorry for the students. It was not my intention to scare them or stress them out about their performance - quite the opposite. I considered changing the grading scheme as they were asking. But I was confident they could succeed in these settings, they just needed a little push to get through the hurdle...

During the lecture following the survey, we had a heart-to-heart chat about the results. I told them I sympathized with the way they were feeling, and that it was not my intention to cause concern. I also told them that my assessment of their work went beyond the points on their portals, and I knew they were on the right track. I reiterated that their current grades were only temporary, and I assured them that the highest levels of proficiency were well within their reach - just a little bit more work and they would be collecting their hard earned rewards, and it would mean much more because it came with patience and effort.

I do not know if it was because of the earnest conversations we already had in the class because of the topics covered, because I gave them opportunities to improve and they wanted to give one to me, or simply because they had no choice other than dropping, but they decided to see it through. To encourage them and reward their efforts, I created some additional resubmission opportunities. I believe this solidified the idea that I was working for them and not against them. As I expected, the resubmissions started picking up in quality. Concepts that were initially difficult to grasp started to click, and 60 (“Excellent”) became the most common score in the gradebook (a reminder that the assignments were meant to be formative - the final grade would be fine tuned in the quizzes and final exam). In the final survey, major concerns about the grading systems were gone, and this is what the students had to say:

“[The instructor] gave us multiple chances to show that we can improve our learning, correct our mistakes, and earn a better grade”
“She helped me gain a better understanding of concepts and how to apply them”
“Resubmissions were very supportive ways to help us improve”
“She believed that we learn through mistakes and that it should not be expected for students to know everything perfectly on the first try when they are learning. This was ultimately the best study approach I have ever seen”

Happy ending (?)

I consider the first delivery of “Fairness, Accuracy, Transparency and Ethics (FATE) in Data Science” a great success. The students opened their minds to a different course approach and ultimately benefited from the experience. I have confidence in what they achieved. Many of them have thanked me, and later came to tell me about how the course was helping them during interviews.

In fairness, I could have done several things differently so that they would not have felt so discouraged in the first place, and I would not have had to ask them to put their faith in me and trust me that I had their best interest at heart. I am currently delivering this course again, and here are some of the things I am changing:

There are now four assignment proficiency levels: “Incomplete/Missing” (0 points), “In Progress” (20 points), “Good” (40 points) and “Excellent” (60 points); “Incomplete/Missing” is reserved for students who turn in nothing or very little, so no student who has put in some effort has to look at a grade of 0 on their portal. This follows some students' recommendations, and although I think the final results will not change (most students will end up earning “Good” or “Excellent”), it will be easier for them to believe that they can improve a 20 than a 0.
Clear communication from the beginning - students have been warned that a progression from “In Progress” to higher level is to be expected.
Rubrics were changed to make room for the new “In progress” level and to move some requirements from “Good” to “Excellent”.
Some assignment questions were rewritten to provide more structure for the answer and reduce interpretability of open ended questions.

To other instructors experimenting with alternative grading strategies for the first time: you may find yourselves in a similar situation, with plans not unfolding quite as expected and students pushing back. You may notice a mismatch between your respective experiences. If this happens, open and honest conversations can help gain more mutual understanding and straighten the ship. If I had to grade myself, I would say that this course is currently at the “Needs Improvement” level (or “In Progress”, according to the new rubric). Yes, it was a successful first attempt, but there is much room for improvement. Here is to my learning journey.

Editorial note: The correspondence between percentage scores and letter grades is different in Canada than it is in the US. This wikipedia article gives some details.

A guest post by

Giulia Toti

I am an Assistant Professor of Teaching in the Department of Computer Science at the University of British Columbia, where I teach introductory programming, Machine Learning, and courses with an emphasis on ethical CS and AI application.

Stephanie Kratz

Sep 30

Thanks for a fantastic post! I had an almost identical experience when I began using a 3-point rubric: students were discouraged by their low grades and frustrated that “if I get one little thing wrong, I’m already at a 67%!” Despite my reassurances that the 3-point scale was more of a guideline than a rule (students named their own midterm and final grades along with an explanation), it took most of the semester to reduce student anxiety. If only I could have read this post in advance of my class. This semester, I am still using a three-level rubric, but I increased the points assigned to each level. This seems to have reduced student anxiety. I will know more when I see midterm survey results next week. I appreciate you sharing what didn’t work because I was able to empathize with so much of it.

Expand full comment

1 reply

Emily C Bruce

This is very familiar, if not in the details! Phenomenally useful essay as I get ready to return to the classroom after three semesters away (for research and having babies).

2 more comments...

Grading for Growth

Discussion about this post