When Alternative Grading Meets Coordination
Lessons in scaling alternative grading to a multi-section, coordinated Calculus I course
Rebecca (Becky) Swanson is a University Distinguished Teaching Professor at the Colorado School of Mines, where she teaches a variety of courses and engages in scholarship of teaching and learning projects. She is a co-founder and co-advisor of the Mines Society for Women in Mathematics, a local chapter of the Association for Women in Mathematics. Becky enjoys reading, hiking, doing crossword puzzles, baking, crafting, and participating in bar trivia - when she isn’t busy with her daughters, Ella and Anna, whom she shares with her fellow mathematician and partner, Steve Pankavich.
Lucas E Quintero F is a teaching postdoctoral fellow in the Applied Maths & Stats department at the Colorado School of Mines. New to teaching pedagogies, Lucas has found a deep interest in alternative grading and novel methods of engaging with students. Lucas spends his time reading and enjoying the Mountains with his wife, two kids, and dog Milo.
After implementing standards-based grading in Linear Algebra at the Colorado School of Mines (which we refer to as “Mines”), we wanted to know how well it would scale to a multi-section, coordinated Calculus course . Mines is a public engineering school with an undergraduate student body of about 6000. Every student must take the Calculus sequence, which means we are offering a lot of it! In the fall of 2024, Becky became the Calculus I coordinator and thought the time was ripe to implement a similar grading system on a larger scale. That fall, around 600 students were enrolled in the course in sections of at most 40, taught by five different instructors. It was a challenge to find examples of large-scale alternative grading that didn’t rely on computerized testing, so we forged our own path, and wanted to share about lessons we’ve learned.
From (uncoordinated) Linear Algebra to (coordinated) Calculus I
Becky worked with a previous teaching postdoc, Aram Bingham, to develop a mastery-based testing model for Linear Algebra. They developed a list of learning outcomes, and students had multiple opportunities to demonstrate mastery or proficiency on these by completing problems on biweekly exams. Three possible marks were given, P for Proficient, MR for Minor Revision (completed online outside of class), and a N for Not Yet to indicate that the student needed to attempt a problem on an upcoming assessment. If the student was given a Minor Revision mark, a successful revision would move that outcome’s mark to Proficient. The exam portion of the course grade, which counted for 70% of the grade, was determined by the proportion of outcomes for which a student attained a mastery mark at some point in the semester. [Footnote: The remaining portion of the grade was determined by engagement, worth 10%, and homework, worth 20%.] We liked the simplicity of the system and the ease at which it could replace the exam portion of a grade, leaving other components of the grading structure intact. The model was successful and we created a webpage resource and published a pedagogical paper.
Unlike Linear Algebra, however, Calculus I is centrally coordinated and has been so for a long time. Coordinated courses assign common homework and exams, and, often, share a Canvas page [fn: Some coordinators provide a LMS shell that instructors duplicate, and other coordinators use a single Canvas page, including shared gradebook. In Calculus I, we have been using a single Canvas page.]. The coordinator is responsible for creating materials, the schedule, and the Canvas page, among other duties. Coordinated courses traditionally have 2-3 midterms and a final exam consisting of short-answer problems graded using partial credit. While undergraduate teaching assistants grade homework, exams are graded collaboratively by the instructors, with each instructor assigned to grade a portion of each exam for all students. Generally, individual class activities are determined by the faculty member, although the coordinator may provide resources such as lecture notes or group activities. Our coordinators are members of the teaching faculty, while most of the instructors are adjuncts, graduate teaching fellows, or postdocs. The coordinator is expected to hold regular meetings with the instructional team.
As Fall 2024 approached, Becky wanted to implement the Linear Algebra model in Calculus I, a 4-credit course required of all students. We referred to the system as “Proficiency-Based Grading” (PBG) to avoid using the term mastery, but it was similar to the Linear Algebra model. Becky reached out to the instructional team, including Lucas, to see if they were interested in piloting PBG in Calculus I. Luckily, everyone was! Becky identified 21 learning outcomes, which were assessed on eight exams held biweekly throughout the semester. Students were able to attempt outcomes between 3 and 5 times. An outcome would appear on three consecutive assessments before “falling off” until the end of the semester. Just as in Linear Algebra, the “exam score” was computed as the proportion of outcomes for which a student demonstrated proficiency at some point in the semester. One difference between the courses was that in Calculus I, we added a final exam over 5 designated core outcomes that was graded traditionally. The course grade was based upon the PBG exam score, the final exam score, homework, and engagement. [Footnote: Engagement, worth 15%, consisted of online homework, attendance, reflections, and a variety of other options. There were 610 engagement points available, but we counted the engagement score out of 520, allowing students some choice regarding how they engaged in the course. The homework was graded traditionally by the undergraduate TAs and was worth 17.4%. The proficiency assessment score made up 54.6% of the grade, and the final was worth 13%.]
We made a few changes from the Linear Algebra model, due to the coordinated setting:
Schedule Adjustment: To make space for 8 assessments, we used the 3 midterm and 1 final day and their corresponding review days from the previous year’s schedule. This meant we didn’t have as much review time in class, so we reduced time spent on a couple of topics to make time to review.
Grading Assignments: Instead of having each instructor grade a portion of each exam, each instructor was assigned a subset of outcomes. Anytime one of those outcomes appeared on an assessment, that instructor was responsible for grading the corresponding problem for all students, as well as grading the revisions.
Organization: We had to have a tight schedule to have time for exams, make-up exams, and revisions. This meant that if an exam occurred on a Wednesday in Week 5, make-ups [footnote: With hundreds of students, someone is going to be sick for every exam, so there were always make-up exams!] would be held Friday of Week 5 and Monday of Week 6. We agreed to complete the grading by Monday of Week 6, so that students would have time to complete revisions [Footnote: Revisions were turned in online to Gradescope. Students had to fill in a question about what they did wrong and how the revision improves their answer and then submit then submit a scan of the revised work.] by Thursday of Week 6. That gave the instructors time to complete the grading of revisions by Monday of Week 7, allowing us to release solutions in time for students to study for the next assessment on Wednesday of Week 7. Additionally, as coordinator, Becky needed to make sure that all assessments were created and printed, grading was done on time, and the Canvas gradebook was updated. This also meant careful scheduling of the grading assignments so that no one was responsible for grading two first appearances of an outcome on the same exam, as the first appearance required the most work.
Communication: The instructional team met weekly to discuss assessment problems, what would constitute a Minor Revision vs. a Not Yet designation, and to share common mistakes occurring on the assessments.
Final exam: As mentioned, the final exam consisted of 5 core outcomes and counted for 13% of the final grade. These 5 problems were graded traditionally, with partial credit, using a rubric from previous semesters. They could also work on any other learning outcomes during the final exam period. [Footnote: We may or may not continue to use a final exam, but wanted to have some way to measure performance against previous semesters.]
So….how did it go?
We are currently analyzing lots of data and will be submitting our results and analysis for publication. In addition to pre- and post-course surveys, we are able to analyze grades in Calculus I as well as performance in subsequent courses [Footnote: All students must take Calculus II and Physics I, and both courses list Calculus I as a pre-requisite.] for both the Fall 2024 cohort (pilot) and a Fall 2023 cohort (“traditionally” graded). We can distinguish in the data students who have seen Calculus before and those who haven’t. We can summarize some of the preliminary data as well as lessons from our own experience!
Positive - Grades: Preliminary data indicate that the overall Calculus I GPA of students in the Fall 2024 group is greater than that of the Fall 2023 group, with bigger gains occurring for the group of students who haven’t seen Calculus before. Additionally, students who took Calculus I in Fall 2024 are generally performing at least as well in Calculus II and Physics I as students who took Calculus I before Fall 2024, again, with bigger gains occurring for students who hadn’t seen Calculus before.
Positive - Student Benefits: Students from Fall 2024 report reduced stress, that they benefited from clear expectations, and that they felt the system was fair and helped them learn. Additionally, many recognized that PBG allowed them to learn from mistakes.
Positive - Instructor Workload: We had support from our teaching and learning center to have a focus group with the instructional team to get feedback on the experience at the end of the term. The group unanimously reported that while grading occurred more often, it was not an increase in workload as the grading was easier.
Challenge - Rubric: Although we had instructor meetings to discuss differences between a Minor Revision versus a Not Yet designation, the implementation of those discussions was not always consistent. Some of the discrepancies in grading from one outcome to another was frustrating to students. [Footnote: For instance, on one assessment, a student could get a Minor Revision mark on one outcome and a Not Yet mark on another outcome for a comparable mistake, because the two outcomes were graded by instructors who had different ideas of a Minor Revision.]
Challenge - Feedback: Also related to the grading was the fact that we didn’t clearly discuss what kind of feedback to give. The feedback different instructors gave varied quite a bit. Some feedback was vague (e.g. “you have an error in part a or b”) or incomplete. This was a challenge as a coordinator, as Becky didn’t want to be too prescriptive as to what faculty should do, but I think we all struggled with answering some student questions about outcomes we didn’t grade.
Challenge - Revisions: Possibly due to the feedback given, students would sometimes submit revisions that were partially complete but didn’t address all of the issues. We would try to give students a chance to fix these (Minor Revisions on Minor Revisions), but this required an even tighter timeline, causing some stress for faculty.
Overall, faculty had a positive experience, as did many students. Our goal in the current semester was to address some of the challenges to make improvements.
What are we doing differently this semester?
Our second iteration is currently running (Fall 2025). We have 21 sections, about 700 students, and 8 instructors. These are some of the changes we made:
We need clear communication about the difference between Minor Revision, and Not Yet - both for students and faculty. We added two items to the course to support faculty and students: (1) for each outcome, we now provide on our LMS a detailed description of the outcome and a list of common reasons for a Minor Revision and for a Not Yet mark, and (2) we created Assessment Prep Quizzes for engagement points. These online quizzes consist of 5-6 solutions to a problem, and students analyze each and mark each as Proficient, Minor Revision, or Not Yet. The quizzes introduce common errors, [Footnote: We used common mistakes from our Fall 2024 pilot to create these.] and students can retake them until they get the right answer. Before making them available to students, faculty discuss the responses and agree on how to mark borderline cases.
We need to be (more) organized! Robert wrote about the sweet spot of biweekly assessment, which we had independently discovered in Linear Algebra! But in a large course with many students, there isn’t much flexibility in the schedule. We added a policy about excused absences past the Monday following an assessment. In particular, we are giving an option to have extra time on a later assessment. [Footnote: no student (out of 700!) has needed this so far.] Additionally, we no longer allow Minor Revisions on Minor Revisions - the timeline makes it too challenging.
If you want to see how we described the system and outcomes, as well as a copy of our syllabus, check out this folder.
Closing Thoughts
At the time of this writing, we are halfway through Fall 2025 semester and are seeing positive improvements from our first term while still facing some challenges. For instance, the Assessment Prep Quizzes and discussions about them have helped us be more consistent about differences between marks, but inconsistencies still occur. Relatedly, while much of the feedback to students appears to be more detailed than last year, there are still times where students aren’t sure what they did wrong.
Communicating the concept of alternative grading to students remains challenging. Becky provides a Day 1 slide template to help other instructors have this conversation, but it isn’t clear how each instructor talks about the system. It didn’t seem to be a problem in Fall 2024, but one current instructor said they spent a quarter of a class fielding student complaints about the lack of partial credit, which makes us wonder how we can support better communication with students. We are considering making a video about the system with a short assignment related to the video.
While there is still work ahead, we’re heartened by what we’ve learned and accomplished in scaling alternative grading across a large, coordinated course. We hope our experience offers insight for others navigating the complexities of coordination, consistency, and meaningful assessment change.






I'm always interested to see the way colleagues use terms and obviously we use them in ways that seem accurate and appropriate for our purposes. In this case I'm thinking about the term "proficiency". Your work here brought me back to an analogy I often use when thinking about the difference between grading products and assessing for proficiency. Learning to drive makes the contrast easy to see.
The road test is a performance on a route chosen by the examiner. If you make a mistake, you can take it again—just like in your class model. Passing the written exam and the road test earns you a driver’s license, which is basically the “passing grade.”
But all of us have met drivers out in the wild who clearly passed the test and, at the same time, don’t exactly radiate driving proficiency. That’s because the road test measures how you perform on that specific sequence of tasks on that specific day. It doesn’t capture how you actually drive across real contexts, real conditions, and real time.
True proficiency shows up in a broader, more varied record of behavior—how someone merges, anticipates, adapts, or manages unpredictable situations. Some of the new in-car AI tools that track long-term patterns probably give a better picture of driving proficiency than the official test.
That distinction feels relevant in assessment, too. When evidence is limited to products the instructor defines—exams, problem sets, a specific format—we’re really assessing performance on those products. Useful, important, but limited. When learners can show what they can do in multiple ways—and sometimes have the option to propose how to demonstrate it—we move closer to the kind of capability we might honestly call proficiency.
Maybe it’s time to rethink not just classroom assessments but driver’s licenses, too. A little proficiency assessment on the Kennedy Expressway here in Chicago at rush hour might produce some very different results.