Alternative grading in a test-forward environment
Timed in-class testing is making a comeback. Can it coexist with alternative grading?
Timed in-class testing seems to be making a comeback. Substantially fueled by concerns over generative artificial intelligence and academic dishonesty, many instructors are pivoting away from take-home assignments and toward a “test-forward” model of assessment. I am one of those instructors, and I have written about my own experiences with shifting my Discrete Structures course toward an assessment model based almost entirely on timed in-class exams and how it turned out.
That model of assessment is likely to become more common as we continue to try and figure out how to teach with AI in the picture. While there are plenty of reasons to be wary of such an approach, my own experiences suggest that alternative grading can still be used well in this kind of environment, and in fact it can help mitigate some of the concerns that an emphasis on timed tests presents. But it will take careful consideration of the issues involved, and creative teaching to make it work. Today, I wanted to open a conversation about handling the issues that come with basing your alternatively-graded class substantially upon timed testing, and share what’s worked for me so far.
Mitigating the snowball effect
One of the Four Pillars of alternative grading is reattempts without penalty: students should, when it makes sense, have the chance to reattempt assessments, based on feedback they receive. Because some may need multiple reattempts on an assessment before they master the concepts, and because the class may need to move forward even though a student might not yet have mastered those concepts, we get what we call the Snowball Effect: The student is still reattempting an earlier topic while new topics, and new assessments, continue to accumulate, causing a compounding effect and making it progressively harder for the student to dig themselves out.
The snowball effect can, and does, happen in any course where reattempts without penalty are allowed, whether or not we use a substantial amount of timed tests. But when there are a lot of timed tests, the effect can be even worse than usual, because those reattempts happen at set times, and for a set amount of time. There’s much less flexibility and more pressure.
I’ve come to believe that we cannot prevent this snowballing from happening; we can only try to make it less likely, and try to mitigate its effects. Some suggestions I have for this are:
Keep it simple. This is the Prime Directive of all alternative grading, and if you lean heavily on tests then it becomes even more important. When students run afoul of timed tests, many times it’s because of the test: It’s too long, the items assess the same concept multiple times while leaving out other concepts, the wording of the question is convoluted, and so on. In the course I wrote about, what was really helpful was to have my standards or learning outcomes very clear and focused, and then each timed exam consisted of one problem per standard that we had discussed in the class so far, and students could opt out of standards if they’d already mastered those or didn’t feel ready to attempt them — essentially unbundling the exam into components that students could do or not do as they saw fit. Because each standard was very simple, the problem assessing it could easily be done in 5 minutes or less by a well-prepared student. Here is an example of one such exam, with the solutions. This didn’t prevent the snowball effect from happening for some, but it made it easier for those students to recover since they could target individual parts of the workload rather than all of it at once.
Teach deliberate practice. But how do you help a student become a “well-prepared” one? I am convinced it’s through the explicit teaching of deliberate practice. Deliberate practice means “practicing with a clear awareness of the specific components of a skill we’re aiming to improve and exactly how to improve them” (source). I’m starting to think that this concept is the missing link for students who end up getting snowballed. Often, it seems, when students get stuck on an early concept, it’s not a failure of intellect but a failure of practice (or a lack of it). The exams my students took in the class I wrote about were often focused on basic tasks to execute — perfect candidates for practice. Many students caught in the snowball effect became fully un-snowballed, once I showed them how to practice those tasks. Right now in my asynchronous online class, I am making weekly videos of worked examples, but I’m framing them as “practice walkthroughs” where I do the worked examples exactly as if I were a student practicing them, so students get not only an example but also they see how an expert would practice.
Use your calendar. I’ve said before that the best idea I’ve ever had as an instructor is the 12-week plan, where I design a 15-week course assuming I only have 12 weeks in which to teach it, and simplifying/cutting/consolidating the course until it fits. Not every course can be set up this way. But if you can do it, this leaves 3 weeks you can use however you wish, and a good use of that time is to insert times for reassessment into the calendar. I like to stop content coverage two weeks before the end of class and spend those weeks doing nothing but reassessment; but you could also sprinkle days, or even entire weeks, in your semester for the sole purpose of getting everyone unstuck on assessments. In my class last semester, I kept 1.5 weeks at the end for reassessment and scheduled makeup days every 2-3 weeks for those who needed either to make up a missed test, or retake one. This is a little like if you go for a walk or a run with a group and some people in the group just don’t move as fast as others — every so often, it makes sense for the group to pause to let others catch up.
Dealing with test anxiety
When you have timed tests, you will also have test anxiety. When you build a course primarily on timed tests, then you’re going to have a lot of test anxiety among students. David wrote a great post a few years ago summarizing some research on alternative grading (standards-based grading specifically) and test anxiety. The short version is that students in SBG classes reported significantly lower test anxiety than in non-SBG classes1.
Which is great, but in my experience an important problem remains: Just because students report lower test anxiety in alternatively graded courses doesn’t mean that there isn’t test anxiety in those courses, and if you dial up the number of tests (to a frequency level you might never see in a traditionally graded class, because of reattempts) it’s possible that your alternatively-graded class could foster just as much test anxiety as a traditional one.
In my class, students understood the grading system and the reattempt policy, but still showed profound test anxiety. And compared to earlier courses with a more balanced mix of in-class and out-of-class assignments, I noted a lot more test anxiety, both in the sense that more students were anxious and that each student was more anxious. What are we to do about that?2
Double-check your communication. Many times, test anxiety can be a communication problem. My students and I do a deep dive on the grading system in week 1, but in week N > 1 that might all be forgotten. So it’s helpful to remind students, Hey, it’s OK if this test doesn’t work out because you can just retake it next week, and the week after that, etc. It can even be helpful to lie about it. In a past class, when it was time for our first Learning Target assessment, I told students that it was just a practice test. Then when they were done, I said — Actually I lied, this was a real test and we will keep any results that met the standard and retest next week any of the ones that don’t. They freaked out for a moment until the “retest” idea sunk in. And actually it’s not a lie, because with reattempts without penalty, every test becomes a practice test where we keep the good results and retry the not-as-good ones. This has been a very helpful way to frame the situation.
Practice, practice, practice. I’ve already mentioned the central importance of deliberate practice. While I think test anxiety is a real condition that a person can have (and that maybe all of us “have”, to varying degrees), the best medicine for it is practice. There are plenty of actions that used to cause us anxiety — talking on the phone, driving a car, even (especially) teaching itself. But we do these now in most cases almost without thinking about it, because we’ve practiced them so much they are second nature.
Pretend you’re a professional musician. In the above I am thinking a lot about my hobby/second life as a gigging bass guitarist. I used to get tremendous stage fright, the musical analogue of test anxiety, but now being on stage is one of the only ways I can relax. Real professionals still get stage fright, and how they deal with it can be very instructive. Among their strategies is overpreparation, knowing the piece they are performing inside and out so well that it’s as if they wrote the piece and are playing all the other instruments’ parts. Professional musicians also not only practice the music, they practice the performance itself, for example by recording themselves, playing the music to a small group, or practicing in the venue and on the stage where the performance will take place so as to fully simulate the performance environment3. And there are psychological and cognitive strategies such as breathing exercises, maintaining realistic expectations (“The music tonight won’t be perfect, and that’s OK”), and positive visualization. Here is a chat I had with Gemini about this subject that has a lot of useful information in it with links, and just about all of it transfers to students with test anxiety.
Ensuring practice and not just mindless repetition
Did I mention that practice is important? It’s far more important than we realize, I think, and you can expect some future posts from me on this subject — going deep on the “interior” of the feedback loop. I tweeted recently:
I’m convinced that the best strategy for students who are struggling, stuck, or behind is deliberate practice — moreso than flexible deadlines, alternative testing arrangements, and the like, even though these have value. And I’m especially convinced that deliberate practice is the key in an environment focused on timed tests. But deliberate practice only works if:
Our assessments reward practice. Even a person who knows how to practice well will get discouraged and fall behind if there is no direct connection between their practice and their performance. This is the essence of that part of Self Determination Theory called “competence”: The belief that your actions are causing you to progress toward mastery. If we have unclear standards, or if the standards are clear but assessments aren’t aligned with them, practice will lead nowhere.
We place an unavoidable emphasis on it in the classroom. Teaching and modeling good deliberate practice cannot just be bolted on to an already unwieldy course (as higher ed folk are predisposed to doing). It has to be at the core, discussed frequently and modeled continuously. We should shift our language to no longer refer to the “importance” of practice but to the near-impossibility of growth or success without it. Given the choice between covering Topic X and focusing on deliberate practice, Topic X should be thrown out. In fact, a good way to plan classes is to adopt a flipped classroom model where the entirety of a class meeting can be used for supervised deliberate practice on a standard or learning target, with any content coverage and basic skills farmed out to be done prior to the session. We could even just start calling them “practice sessions”, or better yet “rehearsals”, instead of “class meetings”.
And as mentioned, we can’t assume that students learn will become expert practicers by osmosis — good deliberate practice within the context of our disciplines has to be explicitly modeled and taught. I’m willing to throw out a few syllabus topics if I can, to make sure that happens.
Timed testing has its share of issues, and even after shifting my assessment to being test-forward last semester and not hating the results, I still remain a little skeptical. But it also has its uses as an assessment strategy, and in any event it is a useful stopgap until someone figures out how AI and higher education can coexist4. That approach can still work quite well with alternative grading as long as we are aware of the issues, and of course start with facilitating students’ growth and working from there.
There’s more to it than this – read David’s whole article.
The research David summarized keeps the level of testing constant and varies the grading method. I have not yet seen research that does the opposite, keeping the grading approach the same but varying the level or quantity of testing, and maybe comparing the differences in reported anxiety across different kinds of grading. For example, if we took a traditionally graded class and a specs graded class and doubled the number of tests in each, or increased how tests contribute to the course grade, I would expect to see an increase in reported anxiety in both groups, but would it be the same amount of a jump? Would the difference in jump size be statistically significant? This seems quite relevant to the issues I’m describing here.
I’ve often suggested to test-anxious students that they should study for tests in the classroom where the test will take place. Those who took that advice said it was helpful.
😅😂
I ran a cycle of in-class quizzes for the first time this past semester, substantially inspired by ideas you’ve presented. It went well! Thank you for all your advice.
The course was a mid-level math course: Intro. to Analysis with around 70 students. I split the material into seven chapters and offered three quizzes on each, and marked them as “not yet” (notated by a 0 in Gradescope), “progressing” (a 1), or “proficient” (a 2), with only the strongest performance counting to the course grade.
I didn’t see test-anxiety problems. Students reported that the approach was less stressful than the traditional prelim-prelim-final. I think it helped a lot that the quizzes followed a closely consistent style, so, while the questions varied, the students quickly learnt what sort of challenge to prepare for. It was immensely heartening to see them improve over the three attempts, often dramatically.
The two biggest problems I ran into were cheating and the logistics of mandated accommodations for some students. The cheating issue was disappointing to see, especially given that most students worked extremely hard and with absolute integrity. It was exacerbated by students being close together in class and by some of the questions being multiple-choice, where answers are visible at a glance. To combat that, I sometimes circulated two versions of a quiz. As for the accommodations, I kept the quizzes short and gave all the students as much time as they needed (within the bounds of practicalities), which substantially worked. Both of these issues added substantially to the burden of running the course.
There were students who struggled, but I didn’t perceive a big “snowball” problem. The first attempt on each chapter was not long after that material featured in class, and I think it helped them to have to grapple with it at least once in a timely manner. I think that with a “traditional” grading scheme, the same snowball problem can be present but it can be worse, because the gaps between tests mean it isn’t exposed so frequently.
Incidentally, I saw an opposite sort of effect. I mostly gave quizzes two at a time: one on the most recent chapter and one on the chapter before that. A couple of students would deliberately bomb the most recent chapter, just using it to get a sighter on that material, but they would nail the prior chapter. I could only admire this strategy.
Hacking Gradescope to mark a 0, 1, or 2 on each quiz was cumbersome. There were points in the background of each quiz, which students could see, and I felt that saying “not yet” (0) again and again to a handful students (who were perhaps attempting the course prematurely) was rubbing it in too much. So I may suppress the 0, 1, and 2 next time and replace them by stating the point-ranges which they, in effect, represented.
Anyway, it was a good experience and I think it had a positive impact. I look forward to running it again next semester and honing it further.