Lessons about Alternative Grading from George McNulty
A tribute to a pioneer in alternative grading
Today’s guest post is by Kate Owens from the College of Charleston mathematics department. Kate is a leader in alternative grading, a thoughtful collaborator, and also a previous guest poster on this blog.
Robert adds: Prof. McNulty plays a large role in my own adoption of alternative grading. In my origin story, I highlighted the influence of Linda Nilson’s book on specifications grading at a critical moment in my teaching career. But it was George McNulty’s Math 142 syllabus, which I found on one of Kate’s blog posts around the same time and to which Kate links below, that convinced me that something like specs grading could work in practice. I owe a debt of gratitude to Prof. McNulty for his innovation and dedication to student growth, and to Kate for writing this post.
I was devastated in June when I learned of the death of my dissertation advisor George F. McNulty.1 George was a professor of mathematics at the University of South Carolina from 1975-2015 and he was named a Fellow of the American Mathematical Society in 2011. He had a profound impact on my life (both mathematically and otherwise) and I am grateful for the many years that I had the opportunity to learn from him. In this post, I would like to focus on how his grading philosophy aligns with the four pillars of alternative grading, and also how it compares with my own.
In many of his mathematics courses, both at the undergraduate and graduate level, George employed systems that fall under the alternative grading umbrella.
George's systems inspired his students to rethink their assumptions about grading, and gradually his ideas spread through word of mouth and professional development to a much larger audience. George-inspired approaches to grading are now used by math faculty in many institutions, under the name “Mastery-Based Testing” or “Standards-Based Testing”.2
The details of one such system can be found on George’s “Math 142 Syllabus” from a second-semester Calculus course taught at the University of South Carolina in Fall 2013. I’ll outline some of the features of George’s system that we find on this syllabus and how they fit nicely into the four pillars of alternative grading: 1️⃣ marks indicate progress; 2️⃣ clear standards; 3️⃣ reassessments without penalty; and 4️⃣ helpful feedback.
Marks Indicate Progress. When looking at a solution to a problem -- whether from a quiz or an exam -- George would assign letters corresponding to three possible levels. In his system, George used “M” for “master level”, “J” for “journeyman level”, and “A” for “apprentice level,” with “M” being the highest. After years of my own adventure in standards-based grading, I have also arrived at a three-tier proficiency scale, although I use different nomenclature. As Robert noted in an earlier post, many in our community are moving away from vocabulary involving “master” or “mastery.” In my own courses, the highest level corresponds to “S” which means the solution was “Successful” or “Satisfactory”; the middle level I call “G” for “Growing”; and the lowest level is represented by “N” meaning “No evidence shown” or “Not Yet.”
Over the years, I’ve found that when I use numbers, points, percentages -- or even letters like “A” or “F” that are common in traditionally graded courses -- my students revert to thinking about grades in the usual way. When I used a GPA-style proficiency scale, with “4.0 = A”, “3.0 = B”, “2.0 = C”, and so on, it was hard to keep my students from averaging grades together across assignments, or thinking that a grade of “3.0 out of 4.0” was equivalent to earning 75% credit.3 Eventually, I moved to using an “EMRF” rubric before I realized the emotional weight students feel seeing an “F” on their quiz, in written feedback, or in the gradebook. I wonder about George’s reversal of using “A” for the lowest level of achievement, and if any of his students suffered a happy confusion believing that earning all “A”s meant they were doing great work in his course.
Clear Standards. George’s course syllabus itself contains a “Sample Final Examination” for the course. He included sixteen problem types on the sample final exam, which essentially correspond to content standards. (Like every enumerated list George ever wrote, the list begins counting with zero.) The problems themselves fell into two categories: “Core” problems, and those which lie outside of the core. George used performance on these problems to arrive at a course grade, using heuristics like “The grade B can be earned by displaying mastery of all the core problems and mastery of about half of the rest of the problems and projects.”
There are three important ways George’s set of problems and his system for grade computation are different from my own.
First, George sorted his problem types into two levels of importance, with some being more important than others. In my own courses, all problem types (I call these “course standards”) have the same relative importance. I haven’t found the need to split them into “essential” or “nonessential.” To earn a grade in my course, students must provide sufficient evidence of understanding on about two-thirds of the entire list of standards (but it isn’t too important to me which ones they skip).
Second, George tracked only the highest performance level a student produced for each problem type. I remember asking him about how he arrived at this policy, in the context of a Business Calculus course he was teaching. “Wouldn’t this mean that students might forget, say, the Quotient Rule, after they’ve completed that exam?” George nodded and replied that he fully expected all of his students to forget the Quotient Rule at some point, and he was only concerned with if they had ever learned it to begin with. He reasoned that if needed at a later time, it would be easy for a forgetful student to re-learn the Quotient Rule technique, and he saw no reason to punish a student for forgetting the Quotient Rule in March instead of July of a particular year. I’ve thought deeply about this approach for many years and I still haven’t adopted it. Instead, I keep track of my students’ most recent performance level. My thinking is that there are some things we won’t need to use again on later problems, and if my students don’t see those techniques again, their grade won’t be subject to a change; on the other hand, there are concepts and calculations that we are learning about now that we will need to make use of later on. The knowledge that some material could appear again on a future test and their grade might be modified in either direction I hope motivates my students to continue practicing those skills. Whenever possible, I try to let my students know which skills we will (or won’t!) see again before the end of the semester.
Third, George lists sixteen example problems. I don’t know how he arrived at this number, whether it was a function of the level of the course itself, the length of time it takes to solve each of the problems, or the number of assessments he was able to produce. As an instructor, I find the simplicity of “four new things on our next test” is alluring! The University of South Carolina runs on a sixteen week semester calendar, so it could also be related to a “one a week” ratio calculation. In my own courses, I’ve found that using around 25 course standards over sixteen weeks is what fits most comfortably. By having more standards, I have a more detailed view of what my students have understood so far (or where we still need more instruction or practice).
Reassessments without penalty. George’s system allows for reassessments without penalty (although I don’t recall him using this terminology). Typically, his courses would have several mid-semester exams, and each one would grow in length. The initial test would have four problems, similar to Problem 0 to Problem 3 found on his syllabus. The subsequent test would have new iterations of those problems -- that is, they would strike at the same underlying concept, but the actual problems given would be new. It would additionally contain four new problems, meaning a total of eight problems. The third test would have twelve problems altogether; it would include new iterations of Problem 0 through Problem 7, and would be the first encounter students had with Problems 8 to 11. The Final Exam would contain new versions of each type of problem but students were only required to complete those problems they hadn’t succeeded on previously. This means that some students never had to complete any portion of the final exam, and any pair of students might not have the same length of final exam.
In practice, George had some kind of script that would run, fetching data from his gradebook and then building a customized PDF test file for every student. By including only problems that a student hadn’t yet passed, students were given several opportunities to do so, all within the scheduled assessments held during class time.
In my own courses, students can request reassessment quizzes whenever they choose, and those occur outside of class time. The major benefit I see is the ability to have a conversation with my students about what they’ve learned since their last attempt (and potentially offer help or additional resources when it is clear that full understanding has not yet happened). The obvious drawback is that I have to meet with students at times that are convenient for both their schedules and my own, and sometimes this can be tricky. One way I combat this downfall is by allotting one day toward the end of each semester just for optional reassessments on any standard chosen ahead of time. I don’t produce a customized test for every student -- although if given a magical coding unicorn, that might be something I would request! -- but instead I just make enough copies of each individual quiz based on an online form completed ahead of time.
Helpful Feedback. It isn’t clear from George’s example syllabus about the kind, quantity, or quality of feedback he would provide students. Much of the feedback he supplied to undergraduate students would be through written comments on their exams, and while I don’t have any first-hand knowledge of those, I was lucky to receive years of helpful feedback from George as a graduate student. Much of the feedback I received came during our weekly one-on-one meetings. We would meet for an hour or two each week throughout my graduate career as I progressed through my qualifying exams, comprehensive exams, masters thesis, and doctoral dissertation. Conversations that include “This is what you’ve done well” and “this is where you can improve” can be just as valuable as a grade. As Robert noted in his post titled “The care and feeding of Helpful Feedback”, the purpose of feedback is iteration and the goal is to help the learner grow. In my case, George offered countless cycles of editing suggestions as I worked toward being a better mathematical writer. In the case of his undergraduate calculus students, the letter grades and written comments were aligned to specific skills that students could revisit again on subsequent exams. Both styles of feedback allow the learner to reflect on their performance and consider pathways for further growth.
When preparing remarks for George’s memorial service, I tried to guesstimate how many hours I spent in his office discussing mathematics and I am confident it’s a few thousand. During those hours, George was always supportive of my ideas, even when my mathematical performance wasn’t where I wanted it to be.4 The universe feels different to me knowing that George is gone because I knew he always believed in my abilities, even during moments when I doubt myself. “I believe in your ability to be successful in this endeavor” is a message I hope to share with each of my own students, and alternative grading is one avenue that allows me to demonstrate that belief to them.
Postscript. Just recently, the University of South Carolina has created the George McNulty Endowed Fellowship Fund. The purpose of the fund is to support graduate students, with a preference for those pursuing a Ph.D. in mathematics. If you were impacted by George’s grading philosophy, friendship, mathematics, life -- or all of the above -- please consider making a contribution. You can make a donation by visiting https://donate.sc.edu/direct-your-gift. The easiest way to find the fund is to type in the fund number -- b12561.
A full obituary can be found on Temple-Shalloran Funeral Home's website.
Here’s the canonical article on Mastery-Based Testing:
J. B. Collins, A. Harsy Ramsay, J. Hart, K. A. Haymaker, A. M. Hoofnagle, M. K. Janssen, J. S. Kelly, A. T. Mohr, & J. OShaughnessy (2019) Mastery-Based Testing in Undergraduate Mathematics Courses, PRIMUS, 29:5, 441-460, https://doi.org/10.1080/10511970.2018.1488317
Don’t get me started on how difficult it was to convince my Learning Management System to stop displaying percentages or averages or sums of these scores!
I didn’t earn a “PhD-level pass” the first time through my Qualifying Exams and had to go through all of them again. Thankfully, the second time was a charm!
Kate, this is both really informative, and a really lovely tribute to George!
Thanks for this, Kate. Appreciated learning about George and his influence on you.