Case study: Mastery testing at scale in University of Michigan’s Calculus 1
One way to use alternative grading in a large coordinated course
One of the biggest questions we hear about alternative grading is: “How do you actually do it?” The heart of our book is a collection of case studies that answer that question, in many different contexts. Today, we bring you a brand new case study that shows a way to use alternative grading in a large multi-section coordinated class.
The University of Michigan is one of Michigan’s flagship universities. Its math department is well known for their use of “gateway exams” and active learning in their introductory math courses. Today, we’ll take a look at the next iteration of this program, which uses an assessment system that has much in common with standards-based testing.
Hanna Bennett and Beth Wolf are non-tenure-track and administrative faculty in the University of Michigan’s mathematics department, where they have taught since 2015 and 2016, respectively. They are coordinators for the department’s introductory math program, where they have been instrumental in implementing a new assessment system. The assessment system includes some of the benefits of alternative grading, while fitting within an otherwise traditionally-graded setting. These “mastery assessments” are streamlined to work within the context of large coordinated classes such as Calculus 1, Calculus 2, and a standalone course called “Data, Functions, and Graphs”.
While the ideas described here have been implemented (or are being developed) throughout the university’s introductory math program, here we’ll focus on a representative example: Math 115, Calculus 1. The course itself is huge – nearly 2000 students in a typical fall semester – but individual class sections are small (18-24 students). This means that there are around 100 sections, taught by between 75 and 90 different instructors who are typically graduate students, postdocs, and teaching faculty. The course is tightly coordinated, with each section having a consistent daily schedule and syllabus. The coordinators and co-coordinators, who in recent years have included Bennett and Wolf, also provide common assessments and rubrics that are used across sections.
Calculus 1 covers a typical range of introductory topics and is focused on conceptual and computational understanding (rather than proofs). The class is also flipped: Students either read the textbook or watch short videos to have an initial encounter with new ideas before class, and complete a “prepwork” assignment before class as well. In-class time is focused on group activities.
Final grades in Math 115 are based on weighted averages. There are three key portions to the final grade: The alternatively-graded “mastery assessments” (35% total), traditional written exams (50% total), and a collection of items together called the “learning component” (15%).1
We’ll take a quick look at the traditionally graded portions of the class first. There are two written midterm exams and one written final exam, each of which are traditionally graded. These exams focus on synthesis and higher-level skills. The “learning component” of the final grade covers a variety of smaller assignments, including prepwork, autograded online homework, written “team homework” challenge problems2, and short quizzes that can either act as practice for the written exams, or are sometimes used to encourage certain behaviors (such as completing a practice mastery assessment with a high score).
Mastery assessments
The alternatively graded portion of the class comes in the form of “mastery assessments” (or just “masteries”). These are three short exam-like assignments, each covering a major topic in the course: Function Concepts, Derivative Procedures, and Integral Concepts, with a fourth “Derivative Concepts” mastery currently being designed.
Mastery assessments focus on fairly direct applications of individual skills and understanding of individual concepts. As Bennett and Wolf write, the masteries “help [students] practice, assess, reassess and improve; this also has the benefit of giving students more transparency and control over their course grades.”
Before each mastery assessment, students are given a detailed description of what will be on it, including learning objectives. For example, here is one of the objectives for the Function Concepts mastery:
Interpret the meaning of a function or its inverse in context, including determining an appropriate domain; interpret the meaning of a composition of functions; and use functions and/or their inverses and/or compositions to write a mathematical expression or equation representing a given statement. (1.3)
The “1.3” indicates the section of the textbook where this topic was covered. As another example, the Derivative Procedures mastery has objectives for many of the standard derivative computation rules, as well as this one:
Given a function or formula, identify the correct rule(s) to use. (3.1-3.6)
Masteries are completed outside of class time using WeBWorK, a free online homework system. Masteries each consist of seven problems, some with multiple parts. The questions are drawn randomly from a large question bank3, and many of the questions include randomized numbers or other inputs so that each student receives a unique combination of questions.
Each mastery is available for about two weeks, and students can make two graded attempts per day. These attempts must be completed in one of several designated campus computer labs, which are proctored by undergraduate tutors.
Below is an example of a problem from the Function Concepts mastery, covering part of the objective described above (“Interpret the meaning of a function or its inverse in context…”):
Note that the problem is “worth” 1 point – WebWork requires the use of points – but point values only indicate whether the student completed the problem fully correctly or not. There is no partial credit within a problem.
If a student correctly completes all seven problems on a mastery, they receive full credit towards their final grade. This varies between 7% and 10% of the final grade, depending on the mastery assessment. If they correctly complete six problems, they earn about three-quarters of the full credit amount (for example, 6% for a mastery whose maximum value is 8%) and can continue to make new attempts if they wish. Below that, students do not earn any credit, and must make another attempt later. Only a student’s best attempt is counted, regardless of the number of tries. These new attempts are a form of reassessment that is counted without penalty in a student’s grade.
Students receive automated feedback and solutions through the online WeBWorK system immediately after completing a mastery. As Bennett and Wolf write, “Students can see their score, correct answers, and full written solutions at that time, and can revisit all their attempts with answers and solutions through WeBWorK at any later time as well.” Students can also get feedback by visiting the “Math Lab”, a tutoring center where calculus instructors and other tutors hold office hours.
Students can take practice masteries before making a graded attempt. Questions on these practice masteries are drawn from the same question bank as the graded masteries, and are provided in the same way (via WeBWorK). Students can make practice attempts from home or anywhere else, and there is no limit on the number of practice attempts. Bennett and Wolf emphasize the importance of having a large question bank of high-quality problems in order to ensure students see a wide variety of problems and aren’t able to re-take practice tests until they’ve seen every possible combination.4
There is also a “reopen token” system that adds even more flexibility. Tokens are imaginary currency that students can spend to reopen a mastery for 24 hours after its two-week availability window has closed. Students begin the semester with three tokens.
Benefits and challenges
Bennett and Wolf mention two major benefits from these mastery assignments: First, students have more control over a significant portion of their grade. This comes through the ability to reassess without penalty. Second, the system makes student grades more transparent: The learning objectives and testing procedures make it clear exactly what matters, how it counts in a student’s grade, and what students can do in order to improve that grade.
There are a number of additional benefits as well. The system provides a great deal of flexibility for students, both in terms of making new attempts, and in not having to worry about attending a single pre-scheduled exam time. This makes things easier on instructors as well, who don’t need to worry as much about excused absences or proctoring retakes.
Bennett and Wolf also noticed an unexpected benefit: The first mastery assessment acts as a form of low-stakes practice for the first written exam. Although the topics are not the same, “many students used to be surprised by the difficulty of the first exam, but now those students are instead surprised by the difficulty of the first mastery, which can be retaken with no penalty to their grade.” As a result, this helps students calibrate their expectations for the higher-stakes written exams.
The mastery assessments also encourage students to study foundational material earlier. Between mastery assessments and written exams, there are more (and more frequent) assessments throughout the semester, compared to the course’s previous structure with only three written exams. This provides structure and waypoints for students to follow in their studying. Because masteries are interleaved between written exams, masteries help students develop skills that act as a foundation for the more advanced learning that is tested on written exams. For example: “in the past, many students went into the final not having a solid understanding of the Fundamental Theorem of Calculus. Now, since understanding of this theorem is assessed on the Integral Concepts Mastery [which occurs several weeks before the final exam], students studying for the final very often breeze through parts of problems dealing with the FTC and can instead focus on solidifying other topics.”
The coordinators have also seen a benefit in how topics are distributed between assessments, which leads to shorter and more focused assessments. Previously, the few written exams were much longer and tried to cover a huge range of topics, including foundational skills that are now assessed on masteries. Bennett and Wolf say that “our eventual goal is to more fully rebalance exam material in a way that would allow us to write shorter exams but still give students a longer time to complete them, with the hope that relieving time pressure would help relieve some testing anxiety for students.”
Finally, course grades have increased since the introduction of mastery assessments. While there could be many reasons for this, Bennett and Wolf see some direct benefit from the mastery structure: “students’ grades [are] much less susceptible to the ‘one low exam grade’ effect” that is often seen in traditional one-and-done grading.
All of this isn’t to say that the addition of mastery assessments has been easy and trouble-free. Bennett and Wolf have made a number of important changes over the past three years.
One significant change was the format of graded attempts. At-home graded attempts were the only option when mastery assessments were first implemented in Fall 2020. After the return to in-person classes the following year, graded masteries soon also shifted to an in-person format. This change was partly due to academic dishonesty concerns. But the bigger reason had to do with how students treated the different forms of assessment: “having to go to a lab to take a mastery for credit made a big and positive difference in habits for many students; they tended to prepare more for the proctored attempts, since they had to get to a lab to take one, rather than simply click a different button on their own device”.
But in turn, this requirement led to trouble with lab capacity and students having to wait in long lines to take an in-person mastery. This has been improved by applying the resources available to a large university (opening labs for longer, hiring more student tutors) and incorporating incentives such as giving students extra credit for earning a high score on a practice mastery before making a graded attempt. This last option helps reduce the number of in-person attempts needed to succeed.
You might notice that the final exam for the course is a traditional written exam. Bennett and Wolf originally changed the final exam into one last mastery assessment, but they have since reverted to a written final exam. This change happened for several reasons. Logistically, a “final mastery” required a much tighter timeline than earlier masteries and swamped the campus computer labs. In addition, some instructors noticed that “students did not seem to synthesize or retain as much of the course material, particularly from the end of the course, as they had in the past when we’d had a written final.” This may have been because the final mastery focused only on foundational skills that could be tested in WeBWorK. The change back to a written final has seemed to encourage students to focus on synthesis, with the mid-semester masteries focusing on foundational skills.
Supporting instructors
Bennett and Wolf, as course coordinators, spend a great deal of time thinking about how to support and encourage the instructors of the classes they coordinate. The coordinators create the overall course schedule and assessment structure, construct individual assessments and rubrics, and manage the logistics of both mastery assessments and exams. In addition, to support the instructors, “our program faculty run a week-long training program prior to each fall semester, hold weekly course meetings for new instructors, conduct class visits, and provide suggested instructional materials and other teaching support throughout each semester.” Through these trainings and weekly meetings, the coordinators intentionally introduce instructors to “the ideas and importance of a growth mindset, equity-focused teaching, and a climate of inclusion.”
As an example of how weekly course meetings work, as the first mastery assessment nears, “we devote almost an entire meeting on the mastery assessments in our courses, starting from the basics of what they are and other logistics, grading details, and also a significant discussion of how mastery assessments align with the principles of equity-focused teaching.” Instructors take practice mastery assessments themselves, including navigating the online interface, to get a better sense of what students need to do.
Instructor training and support also emphasizes concrete ways to support struggling students, which can be indicated by making multiple unsuccessful graded attempts at masteries. The university’s technology support staff have created various resources to help with this, such as a color-coded page showing the attempts and grades that each instructor’s students have completed on each mastery assessment.
Bennett and Wolf have had significant institutional support in building mastery assessments for three full courses. This support includes access to instructional consultants, course releases, and summer funding to help build the assessments, as well as support from staff who can create helpful materials such as the color-coded reference page described above. Bennett and Wolf say that “We don’t suggest undertaking something like this at this scale without significant departmental and institutional support!”
Some final thoughts
What do students think about this approach? Survey data from an early semester of implementation in Calculus 1 indicated that students overwhelmingly (88%) saw mastery assessments as beneficial to their learning – a number similar to those who found in-class work helpful, and twice as many as found homework helpful.
Here is a student response from a survey about masteries: “I thought they were very beneficial. For each chapter that we had a mastery, I felt like I learned the material very well! ... The masteries taught me more about a specific chapter than I have ever learned in any math class!” Likewise, students in another introductory class that uses a similar assessment setup said that “I feel less worried about the grade and more focused on figuring out the problem” and “You need to fail a little bit, and knowing that you can fail and it's not the end of the world... it makes me feel a lot better about learning.”
Bennett and Wolf’s work illustrates one way to implement alternative grading within a large, coordinated multi-section class. Many more approaches can be found in Chapter 8 of the Grading for Growth book.
These weights have changed over the years, with a general trend of the mastery assessments increasing in weight as the written exams decrease. The numbers here represent a typical case.
The team homeworks also use some elements of alternative grading. They are graded using a two-part rubric focusing on mathematical correctness and communication quality. The levels of the rubric are fairly coarse and describe the work’s holistic progress towards meeting specifications, rather than adding or subtracting points for specific items.
Some questions are drawn from WeBWorK’s “Open Problem Library”, and others are written by course staff or faculty. Many are based on problems from the course textbook.
They also note that students “can be impressively creative at finding unintended patterns that allow them to enter a correct answer without understanding the underlying concept.” Bennett and Wolf have worked to increase the variety of problems available, which also encourages students to focus on understanding over pattern-seeking.
Mastery learning is a great approach, and it's important to ensure that formative assessment is mastery-oriented. Many instructors have been using publisher online or LMS quizzing for years now to achieve this. However, some observations and studies tend to overlook an important aspect when it comes to effective implementation - providing proper training to instructors. When instructors are given training on how to create a common syllabus and textbook and how to set up a final exam that is inclusive of all students, things tend to go smoothly.
I recall the example of a psychology class that had started using publisher quizzes and adopted a flipped classroom model. They made a common syllabus, met regularly, and had a common textbook. The passing rates increased noticeably, and all credit was given to the flipped class. Similarly, a seminal study on a pharmacy flipped classroom reduced the syllabus by 50%, and again, credit was given to the flipped class.
Through my own research, I have seen that a flipped classroom environment improves teamwork skills and the classroom experience. However, the final exam and concept inventory improved only by a small effect size when compared to a blended classroom that had the same resources available to both groups. Although the effect size would have been more significant compared to a traditional lecture class, it is still better to start with a blended course with common resources and instructor training than a flipped classroom, as it tends to have more buy-in from instructors and students.