Alternative grading in probability and statistics
How we used competency-based grading for 300 computer science students at a Dutch university
Today we bring you a guest post by Nelly Litvak and Noela Müller from the Department of Mathematics and Computer Science at Eindhoven University of Technology. This post is based on their longer article. While working with Nelly and Noela to edit this post, I learned a lot about the culture of higher education in the Netherlands, and how alternative grading fits into it. I hope you enjoy this as much as I did! —David
The course Probability and Statistics for Computer Science at Eindhoven University of Technology (TU/e) in the Netherlands enrolls about 300 first-year (freshmen) students, a decent share of which are international students. Noela gave lectures to all students together. We also gave tutorials (called ‘instructions’), for which the students were divided into four sections; each section had their own instructor (Nelly and three postdocs). The average student in this course has no particular difficulty with mathematics, but also no strong intrinsic motivation for the subject per se.
At the TU/e, the academic year consists of four quarters. A quarter consists of 8 weeks of classes followed by two exam weeks. Our course was taught in the third quarter, in February-April 2024. In this run of the course, its material and organization were thoroughly revised. Inspired by the book ‘Grading for Growth’, we integrated standards-based grading (with a small element of specification-based grading). We will refer to our implementation as ‘competency-based’ grading, because our learning goals are formulated as ‘competencies’ rather than ‘standards’.
Competency quizzes
We have formulated the learning goals in terms of 16 competencies (C1-C16), see the full list in Table 1 of this article. We stated the competencies in an actionable form, such as:
C14. I can compute and interpret basic confidence intervals.
As advised by ‘Grading for Growth’, this formulation unambiguously specifies what each student must do to demonstrate their knowledge of the topic.
We assessed the competencies using a small quiz in TU/e’s assessment software ANS Delft.
We believe it is essential to assess students’ ability to write down a probabilistic argument. However, doing this for every competency didn’t make sense because it would result in excessive grading, and because some competencies are mostly procedural. In the end, we chose three out of 16 competencies to be assessed using one open question with manually graded written solutions. These are competencies which genuinely require derivations and proofs using probability theory, and also are relevant for computer scientists, such as:
C7: I can derive the expected running time of simple algorithms using linear functions of random variables.
The remaining 13 competencies were assessed with one multiple choice and one numerical question each, and graded automatically. Multiple choice questions work very well to assess understanding of the theory (see more on this in this article). We mostly used questions with multiple correct answers and required students to choose all of the correct options. For example, we often asked “Which of the following four statements are true?” We required answers to be 100% correct, to set a high standard, and to account for the chance of success by a random guess. Students often found the questions difficult and had to think hard. Our numerical questions were akin to a usual exam. We required only numerical answers (without intermediate computations), we allowed some margins for rounding errors, and we made use of random number generators to create a different question for each student.
For quizzes, we gave three marks:
S = Success,
G = Getting there,
N = Not yet.
In an automatically graded quiz we gave S if both answers (numerical and multiple choice) were correct, G if one of the answers was correct, and N otherwise.
For the manually graded open questions, we used 3 criteria: 1) The solution is correct (including the derivation and the final answer). 2) All variables are defined. 3) All steps are explained. We gave S when all three criteria are met, G if two criteria are met, and N otherwise. For S, we didn’t require perfection, for instance, we could ignore a typo, or a small computational error. Looking back, the binary assessment of each criterion wasn’t very convenient because often a student met all three criteria partially. In these cases, we struggled to communicate why we gave G or N. Next time we will use the same or similar criteria, but we will give the mark for the entire solution, and address all criteria in the feedback. We hope this way it will be easier to grade and to explain the marks to the students.
We always gave detailed feedback. Since there was only one question per competency, and since not all students took the quiz at each opportunity, we always graded quickly, yet kept grading at bay!
Final grade
TU/e’s exam and grade system might be unfamiliar to some readers, so we describe them in more detail here. Each course must have two exam attempts: final exam (in the exam weeks of the current quarter) and resit (in the exam weeks of the next quarter). The final exam and the resit are equivalent, and the same rules apply to both. Students take a resit if they didn’t attend the final exam (e.g. due to illness) or if they failed at the final exam. A few students might choose to take a resit to improve their grade. The grades are from 1 (lowest) to 10 (highest), the passing grade is 6. In the traditional point-counting grading system, the minimal passing grade is 5.5, rounded up to 6. For a basic math course, the passing rate of 80% is seen as high, and the passing rate of 50% is seen as low. Students are assessed exclusively by their individual performance; curving is never applied in the Netherlands.
In our course, the final grade depended on the number of S’s and G’s, with some additional rules.
Rule 1. Accounting for G’s. We had the conversion rule that S in 1 competency can be replaced by G in 2 competencies. We found it fair to count the effort for getting a G because our requirements for G were already quite high (see above). It was a good decision also because the students mostly didn’t feel that our system was ‘all-or-nothing’.
Rule 2. Exam requirement. We required at least four S’s (or an equivalent combination of S and G) either at the final exam or at the resit. This is because at the final exam and the resit, we used technology to block student’s browsers, but during the course this technology was unavailable. Importantly, for the exam requirement, the students could redo competencies for which they already had S or G before. If they did so, they wouldn’t improve their grade, but this gave them a choice to meet the exam requirement more securely.
Initially, we implemented the exam requirement because the Examination Board (the body that guards eligibility of degrees) had two concerns: 1) that students could pass the course without the final exam, and 2) that there was insufficient invigilation (proctoring) at the intermediate quizzes. The Exam requirement resolved both concerns. At the end, we were happy with the exam requirement because otherwise some students might have stopped working on the course once they scored enough S’s. We could avoid this with other extra grade requirements (e.g. two S’s for some competencies or spreading of S’s over the course), but this would complicate the system a lot. The exam requirement solved many problems while keeping it simple.
We defined the final grades as follows:
Grade 6 out of 10: S in 8 competencies in total; S in 1 competency with an open question; exam requirement.
Grade 7 out of 10: S in 10 competencies in total; S in 1 competency with an open question; exam requirement.
Grade 8 out of 10: S in 12 competencies in total; S in 1 competency and G in 1 competency with an open question; exam requirement.
Grade 9 out of 10: S in 13 competencies and G in 1 competency in total; S in 2 competencies with an open question; exam requirement.
Grade 10 out of 10: S in 15 competencies in total; S in 2 competencies and G in 1 competency with an open question; exam requirement.
Fail. The students who didn’t meet the requirements for a 6, didn’t pass the course.
Implementation of competency-based grading
To provide enough practice and retrial opportunities for the students, we created 6 versions of each competency quiz: 2 for practice, 2 for tests during the course, 1 for the final exam and 1 for the resit. The task of designing the quizzes was divided equally between the four instructors. This was a lot of work, but each instructor managed to finish their assigned test-creation work within one-two months. We will gradually extend the question bank in the coming years.
In the class, we covered 2 competencies per week. Then in each week, 4 quizzes were open, covering the 4 competencies from the previous two weeks. For example, C7 was covered in week 4, so the quiz for C7 was open in weeks 5 and 6, at the final exam, and at the resit.
Students made quizzes in class during the four instruction sections for 45 minutes. During the final exam and the resit all 16 quizzes were open for 3 hours. Usually, there wasn’t enough time to complete all the quizzes. We were happy to see that the students consciously chose which quizzes to attempt.
Students could access quizzes from home, but we took attendance, and counted only quizzes made in class.
In the quizzing software, we provided a bundle with summarized formulas and the lecture slides. We found this adequate because we believe these materials are similar to those that students could find online, if they wanted to look it up.
Our experiences
Is 16 competencies too many? Sixteen competencies is a fairly common number for a semester course. Our course was shorter, but we believe it worked well nonetheless. We liked that our competencies were very specific, and could be assessed with small targeted quizzes. Next time we will reduce the number to 14, but for a different reason: in the current setup, C15 and C16 had only two attempts (final exam and resit), this was unbalanced with the other competencies. Next time, we will cover 14 competencies in 7 weeks, with quizzes on the last two competencies in week 8. Students also found that some competencies were much easier than others. When revising the competencies, we will try to make them more equal in difficulty.
Explaining the system to the students. While the students were mostly very positive about the new system, many of them complained that it was hard to understand. At some point, even an academic advisor wrote to us asking for explanations! Here is how we will improve next time: 1) Always use the same word for the same thing (e.g. use the term ‘quiz’ or ‘test’ but not both). 2) Communicate the requirements for each grade very clearly (we had beautiful colorful figures, but these proved very confusing, maybe a short text as above would work better). 3) Include a FAQ section, with questions like ‘Will I get partial points if I made only one error in a multiple choice question?’ (No) and ‘For the exam requirement, can I retake quizzes where I already have an S?’ (Yes).
Students in control. We liked that in the alternative grading system, the students could navigate to the desired grade. Students mostly appreciated this, and did make their own choices. Some said that our approach discouraged them to learn harder competencies. However, in the traditional system, too, students often choose to focus on simpler material, maybe unconsciously. We believe that no system can eliminate such slacking completely, but the advantage of alternative grading is that the slacking is a transparent and deliberate choice.
Reaching out to struggling students. While the course was running, David Clark published a blogpost ‘How alternative grading helps me support struggling students’. Inspired by these ideas, each instructor identified students who started out well, but were getting all N’s lately (these were less than 10%). We emailed these students with the subject ‘Reaching out’ so that our intention to help was immediately clear. To those who responded, we offered more practice and feedback. Students appreciated this, and it helped at least some of them to pass the course.
The four pillars of alternative grading. We found that clear standards (our 16 competencies) worked very well for our course. The students also said that it was very clear what was expected from them. As an important by-product, we believe that for follow-up courses, it is valuable that the students know exactly which competencies they mastered and which they were lacking (see more on the prior knowledge problem in this article). Revision without penalty was a truly huge advantage of alternative grading! Students liked very much that not everything came down to a random exam, and said that multiple attempts made the course less stressful. In turn, we could maintain high standards because N was not final. After this experience, we believe that revisions without penalty are natural and simply fair. As for helpful feedback and marks indicating progress, the students mostly perceived S, G and N as a grade, not as an indication of progress, and they often saw the feedback mostly as justification for a lower mark, and not as direction for growth. We believe that some more work is needed before the students internalize the idea that feedback is there to help them grow, and N doesn’t mean failure in our course. For example, an inspiring new book ‘10 to 25’ by David Yeager suggests to craft and frequently repeat transparent statements about why we give feedback and what the grades mean. We will keep working on it.
Better learning. We believe that students learned better simply because 100% of them were present every week for the quizzes. Students themselves said that weekly quizzes helped them to actively learn the material. In the end, the passing rate and the average grade were higher than with the previous traditional system. We cannot conclude much of it, but we believe that our grades adequately match the abilities and the interests of these students.
High standards - high support
Traditional grading brings upon us, teachers, the painful and draining dilemma: should I lower my standards or should I let the student fail? Alternative grading felt great because we could set high standards (good write-up; 100% correct answers), and yet be more relaxed and supportive to the students. We were happy to learn that this is exactly the right mentor mindset: high standards - high support.
We will improve and revise the system but we won’t go back to traditional grades!
About the authors
Nelly Litvak is a full professor in the department of Mathematics and Computer Science at Eindhoven University of Technology. Her research is on complex networks and random graphs. For many years she has been innovating her courses and writing about education. An extended version of this article was published in her column series ‘Better than Blackboard’ in Nieuw Archief voor Wiskunde, the journal of the Dutch Royal Mathematical Society.
Noela Müller is an assistant professor in the department of Mathematics and Computer Science at Eindhoven University of Technology. She works on combinatorial probability with a focus on random combinatorial optimization. Noela is the lecturer (in North-American terms, instructor) responsible for the course that we describe in this blog post.