What three research articles on deliberate practice say about grading
The connection between deliberate practice and alternative grading is closer than you might think.
In my last couple of posts (here and here), I’ve been exploring the idea of deliberate practice and how it relates to alternative grading. Deliberate practice is a structured, purposeful approach to skill improvement. It’s distinguished from ordinary practice by its focus on specific goals, targeted feedback, and continual refinement of technique. Unlike what is called “naive practice”, which is really just directionless repetition, deliberate practice involves breaking down skills into smaller parts, practicing them repeatedly, seeking feedback to identify weaknesses, and adapting methods to address deficiencies based on the feedback. Deliberate practice is highly demanding and not much fun. But there seems to be no shortcut around it if you truly want to learn something deeply.
Earlier, I said that deliberate practice is what productive engagement with a feedback loop looks like. This engagement with a feedback loop is at the core of all learning, and a hallmark of well-constructed grading and assessment systems in a course. In that article, I laid out what I think are the connections between deliberate practice and alternative grading. In this article I want to go a little deeper and look at three specific research studies that make this connection.
Using low-stakes assessment to drive deliberate practice
Chapman, K. E., Davidson, M. E., Azuka, N., & Liberatore, M. W. (2023). Quantifying deliberate practice using auto-graded questions: Analyzing multiple metrics in a chemical engineering course. Computer Applications in Engineering Education, 31(4), 916–929. https://doi.org/10.1002/cae.22614
In “Quantifying deliberate practice using auto-graded questions: Analyzing multiple metrics in a chemical engineering course”, the researchers analyzed over 84,000 attempts from more than 250 students on online homework problems in an interactive textbook. By monitoring fraction correct (performance) and attempts before correct (perseverance), researchers gained objective insight into problem difficulty. These and other metrics were combined into a composite “deliberate practice score“ that was then studied along with the other metrics. For instance, multiple-choice questions were the easiest, showing the highest median fraction correct (91%) and the fewest median attempts before correct (1.6).
By having a quantitative sense of which kinds of questions were easiest or hardest and which ones yielded the most or fewest attempts before correct, the researchers could scaffold the questions in the online homework to maximize deliberate practice. By starting with the multiple choice questions and ending with questions that have a multiple numeric type (where students have to submit more than one correct answer), the online homework could be engineered to support deliberate practice, as measured by the deliberate practice score.
The online homework questions supported deliberate practice in other key ways. The auto-graded nature of the problems provided immediate feedback on correctness. Students were also allowed an unlimited number of attempts per question without penalty, incentivizing practice. Finally, the system enabled students to make additional attempts even after correctly solving a problem, which did not count toward their course grade (i.e., practice attempts). Students disproportionately chose to practice the most difficult questions (those with a higher deliberate practice score), demonstrating self-directed, focused practice that aligns with deliberate practice, independent of grade incentive1.
If this is beginning to sound like the Four Pillars of Alternative Grading, then you’re on the right track. The use of low-stakes assessments in this study, and in many of our own classrooms as we teach, exemplify Helpful Feedback and Reattempts Without Penalty. The feedback in this case is helpful on the basis of its immediacy. Students don’t have to wait for this feedback -- they get it right away and can convert it into practice before taking another swing at a problem2. And as mentioned, these homework problems could be retried as often as students needed.
So in this study, we see that methods that promote deliberate practice line up cleanly with (two of) the Four Pillars. And conversely, if we are building our grading methods to align with the Four Pillars, we’re setting students up to work in an environment that promotes deliberate practice.
The view from clinical education
McGaghie, W. C., Issenberg, S. B., Cohen, E. R., Barsuk, J. H., & Wayne, D. B. (2011). Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Academic Medicine, 86(6), 706–711. https://doi.org/10.1097/ACM.0b013e318217e119
On a larger scale, the study “Does Simulation-Based Medical Education With Deliberate Practice Yield Better Results Than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence“ is a quantitative meta-analysis spanning 20 years of research studies comparing the effectiveness of what’s called “Simulation-Based Medical Education with Deliberate Practice” (or “SBME with DP”) against traditional clinical education in medical training. Unlike the usual “see one, do one, teach one” approach in traditional clinical education, in SBME with DP students in a highly structured and repetitive process aimed at achieving skill mastery. The learning objectives are well defined; the students engage in focused, repetitive practice that is evaluated rigorously and receive feedback, and they use the feedback to monitor performance, correct errors, and repeat the task being practiced. The process continues until they reach a mastery standard. Once this happens, they move on to the next task or unit.
Like I said, this is a meta-analysis which means it looks at the results of a collection of papers on the subject and tries to distill the main findings in the aggregate. Each of the 14 studies involved here were comparing SBME with DP against traditional clinical instruction. Clinical students in SBME with DP consistently outperformed those in traditional settings for acquisition of clinical skills, with an overall effect size correlation of 0.71. The “effect size” is a statistical measure used to quantify the magnitude of the difference in skill acquisition between the two methods taken across different studies. An effect size of 0.71 is considered to be “large”.
In David’s post on “knee-jerk reactions”, there was a lengthy footnote from me in response to “Do you want your heart surgery done by a doctor who always got a second chance?” This meta-analysis confirms what I wrote in that footnote exactly: In fact, when you look at how doctors are actually trained, if they are trained well, then they get second chances all the time, and they get feedback on their chances and the opportunity to correct their mistakes. It is actually a hallmark of best practices in clinical education now to involve doctors (and others like them) in training using deliberate practice methods that give them not just second chances, but as many chances as they need until they have mastered the technique3. I personally am extremely glad that the heart surgeon that worked on me was trained in this way.
The parallels to the Four Pillars of alternative grading are clear. This form of training sets clear standards, gives students helpful feedback, and provides plenty of opportunities for reattempts without penalty. Even the “mark” is helpful in that it indicates progress -- students simply either move on or they don’t. Something like SBME with DP is probably close to the practice of anybody who is grading with the Four Pillars already. And it could be used as a model, not only for grading methods, but also for what we do with students in the classroom.
More on reattempts without penalty
Miller, K., Callaghan, K., McCarty, L. S., & Deslauriers, L. (2021). Increasing the effectiveness of active learning using deliberate practice: A homework transformation. Physical Review Physics Education Research, 17(1), Article 010129. https://doi.org/10.1103/PhysRevPhysEducRes.17.010129
As we’ve written here before, reattempts without penalty, the fourth pillar, is really the heart of alternative grading. Without reattempts on a task, there is no feedback loop. And if a reattempt on an academic task is penalized, then every turn through the feedback loop is dampened to the point where eventually the loop stops before the learning standard is met. And it’s fair to say that the process of mindfully iterating on a feedback loop is deliberate practice.
That point has really driven home in the study, “Increasing the effectiveness of active learning using deliberate practice, a homework transformation”. It focuses on demonstrating how applying the principles of deliberate practice to out-of-class homework in an introductory physics class significantly enhances student learning, even in courses already using active learning in the classroom. This class already had homework as part of the assessment structure. That homework used an interactive online component that allowed for reattempts. However, it lacked any targeted sub-skill practice or progressive scaffolding.
The researchers “transformed” the homework into a weekly assignment with two parts. One was an online portion consisting of 10 multi-step exam-style problems and an offline paper portion consisting of about five similarly complex problems that students wrote out and submitted. The online problems were administered through an online homework system. Students could revise and submit their answers as many times as they wanted for the online portion. They did lose 5% of the total question credit for each incorrect answer submitted, so this is not strictly “reattempts without penalty”; however, that’s a fairly light penalty. Students were only told whether their answers were correct or incorrect and did not get additional hints or guidance with the online questions. The paper problems, on the other hand, received feedback from teaching assistants, which was delayed by about a week.
The “transformed homework” added 25-30 subskill questions that students were instructed to complete before attempting more complex problems. If stuck, students could resubmit incorrect answers, use targeted hints, or view the solution for a zero score. This repetition with targeted feedback is essential for deliberate practice. The design ensured that each subskill question focused on a narrow concept and provided hints to address the most important misconceptions, allowing students to achieve appropriate mastery of the subskills through repetition and feedback before moving on to complex tasks.
The transformed homework contained 3−4 times the number of problems as the traditional homework. However, in a controlled experiment, students in the transformed homework groups scored 5%−10% higher on a test of learning compared with the traditional groups, despite spending a similar amount of time on task. This suggests the initial learning investment in sub-skills (via reattempts and feedback) led to a payoff in faster, more efficient completion of the later complex problems.
In essence, the multiple reattempts were not simply a chance to guess, but a mechanism to engage in mindful repetition with immediate, specific guidance on the foundational “sub-skills,” leading to significant learning gains. This “mindful repetition” is an economical way to describe deliberate practice.
In this study, like in the others, we’re seeing a congruence between deliberate practice and alternative grading. Methods used in the classroom that promote deliberate practice, such as reattempts without penalty (or minimal penalty, in this case), supplemented by immediate feedback, both promote deliberate practice and form a good basis for an alternative assessment and grading setup. In other words, deliberate practice and alternative grading are mutually supportive.
Conclusion: Synergy
I looked at several research papers about methods that promote deliberate practice in the classroom, not just these three, and the more I read them, the more I thought I was actually reading about alternative grading instead. As far as I can tell, none of the courses in these papers, including the courses involved in the meta-analysis, used anything other than traditional grading. (Although I’m not exactly sure how the clinical practices referenced in the McGaghie paper were “graded”.) But they very easily could have been, by devising a grading approach using the Four Pillars and the assessment approaches already in place.
The two ideas — deliberate practice and alternative grading — have a synergistic relationship. A well-constructed alternative grading system is focused on and supports deliberate practice, both in and outside of the classroom. And conversely, methods that promote deliberate practice are practically begging to be graded alternatively. Grading deliberate practice tasks with points and averages seems to short-circuit the very thing it’s trying to attain.
The online homework questions themselves counted for 5% of the student’s final grades, the same as a separate participation grade.
The specific way some of the metrics were constructed in this study screened out students who simply retried a problem immediately without additional practice. I refer you to the original paper for details.
I noted on David’s article that typically, beginning clinical students are not working on real live humans — they are mostly using cadavers, animal tissue, hyper-realistic mannequin-like dummies, or computer simulations. So the ethical issues with “reattempts without penalty” in this case are minimal and there’s no downside to just reattempting a procedure hundreds of times if necessary before they get it consistently right. Eventually, students will “graduate” to getting to do actual procedures on live humans. I was one of those humans. My heart surgeon was a resident who was “awarded” the opportunity to replace my aortic valve because I was an easy case, as an otherwise-healthy adult. The attending physician was there in the room to offer feedback and, if needed, step in and finish. I think she (the resident) did OK.
Thanks for sharing!
I think that this type of practice is very useful. I use Canvas quizzes to do the same thing Chapman et al. did in their paper. My course doesn't have these types of virtual quizzes ready made, so I have to make them. I make as many of them as possible formula problems. Mostly the students see different problems from each other. They are a beast to set up, but I should be able to reuse them from year to year, so it's worth the effort.
Like Chapman et al., I also make them infinite attempts and no penalty, but there is a hard deadline. (Usually I give them 3 days to work through the problems.) They are worth 10% of the grade. My goal was ~1 per week but I can't make them fast enough.
I call them "Practice Problems" rather than quizzes because students have an idea for what quizzes are. I have to tell them repeatedly at the start that these are practice and so they are *strongly encouraged* to get help if they get stuck.
The big downside to this is I have no way to give specific feedback on what they are doing wrong. But I do post the solutions after the assignment closes so they can check their work against the way I solved them.