A stop/stop/stop for my alternative grading

They seemed like good ideas at the time...

Aug 14, 2023

Do not enter sign at end of road — Photo by Dim Hou on Unsplash

There are a number of things I used to do in my alternatively graded classes that I don’t do any more. These are things that, well, they seemed like good ideas at the time. But since then, whether due to experience or a change in philosophy, I’ve stopped doing them. Today’s post is dedicated to three of these abandoned ideas: What are they? Why did I use them? And why don’t I use them any more?

The title of this post is, of course, is a play on Robert’s favorite “stop/start/continue” format. So let’s forge onward with my own personal stop/stop/stop reflection.

Standard salad

When I first started using alternative grading, I really loved the idea that in the same work, students might show evidence of understanding one standard (topic, skill, etc.) while still working on learning another standard. For example, a student in a Calculus class might choose (and justify) the right “rule” to apply in a problem, but then they might apply it incorrectly (perhaps due to algebra errors). Those are two separate skills: understanding how to pick the right approach, and actually implementing it, are not the same thing. So, I could give the student credit for knowing what to do, but no credit for how they executed it.

As far as it goes, that’s OK. But I went all out with this idea. Like, way too far. Here’s an example of an exam problem from my very first time using standards-based grading (this was a graduate-level class in my area of research expertise):

A long winded exam problem with 5 codes at the start: "NL-F+", "NL-F", "NL-C+", "NL-I+", "MT-P+".

Those five codes at the very beginning are all different standards that I thought a student’s work could meet. This problem was a “standard salad”: an assessment that pulled together multiple ideas that covered a variety of different standards.

I quickly learned that it’s practically impossible to assess student work in this kind of situation. Sure, if a student wrote a thorough and fully correct response, then they likely earned credit for all five standards. But if there was an error, I had to somehow decide which standard it most closely involved. Was their mistake an error in recalling or stating a “fundamental” definition (NL-F)? Or in how they used it (NL-F+)? Or was it really an issue of how they applied it in the Noiseless Coding Theorem (NL-I+)? In practice, it was impossible to untangle an error and assign it to just one standard.

The result was that I had to make some arbitrary decisions, and it was hard to make those consistently. Students got annoyed with those inconsistencies. Sometimes a single error caused students to lose credit for a whole swath of standards, which annoyed them even more. It was hard for me to explain why I gave credit (or not) for any particular standard.

Nonetheless, I was really enamored of this “standard salad” idea, and used it in many subsequent classes. I usually tried to reduce the number of standards – maybe two per problem – but the same issues kept coming up.

What I do now: The main thing I’ve realized is that “standard salad” represents a fundamental mismatch between my goals, and the kind of question I’m asking. Trying to assign too many standards to a single question means that I’m trying to assess too much in that question. So, depending on my goals, I’ve changed to one of two other approaches.

If I’m most interested in seeing students work with discrete skills, I assess those skills in separate questions. Most often I write a short page of questions that all address the same (single) standard. I assign a single mark for that one standard based on showing consistent understanding across all of the problems. This is much simpler to assess.

This also forces me to think carefully about what matters for demonstrating understanding – does an error show a fundamental misunderstanding of the standard, or is it irrelevant to the topic at hand? Part of what I was trying to do with the standard salad was to have a way to assign “blame” for any given error, no matter how small. Indeed, I often included a standard called “Attention to detail” that could take the blame for things like copying errors or minor arithmetic mistakes.1 Nowadays, I decide whether those kinds of errors are critical to showing understanding of the standard I’m assessing. If they are, students don’t earn credit for the standard. If not, and if I can see clear evidence of understanding of that standard, then they can earn credit despite lack of perfection.2

There’s another option: In problems like the one I showed above, what I was really trying to understand was whether a student could bring together a bunch of different ideas into a coherent solution. In that case, I’m not really assessing individual skills. Rather, I’m assessing the ability to synthesize and put together the ideas, demonstrating higher-order understanding. That calls for specifications grading, which looks holistically at an entire solution rather than individual skills. If I reworked that problem today, I would instead include a clear list of specifications, covering both content and writing quality. A student would need to holistically meet all of the specifications to earn one single overall mark. That mark would count towards earning a high grade in the class, because successful work would show a strong ability to synthesize and combine ideas.

Standards on online homework

Online homework is a pretty common part of introductory STEM classes. I’ve used it – usually a free system called WeBWorK – many times. Online homework can be quick and easy to use, it’s autograded, and if you allow multiple attempts it has reassessments built right in. But how does it actually fit into a student’s grade?

At first, I would create a weekly online problem set and decide which standards it covered. Then students could earn credit for various standards by “successfully” completing the problem set. For example, a student who successfully completed a problem set on derivative calculations might earn credit for several different standards, each related to a different type of derivative calculation involved in solving those problems. (Does that sound familiar? It’s getting dangerously close to a “standard salad”.)

I put “successfully” in quotes, because I always had trouble deciding when students could earn credit for a standard. I had to try to balance online homework’s main benefits – speed and automation – with my desire to know what students were actually doing. For example, was a 90% on the whole problem set good enough to earn credit for each standard? Maybe 95%? Were the mistakes students made important ones, or just the classic problems with formatting or data entry that plague so many online systems? The percentage doesn’t tell.

Once I decided on a cutoff, this was easy to implement: Download the list of scores on a problem set, find the students who had a high enough score, and enter credit for each standard in my gradebook.

But that was never really satisfactory, since I didn’t really know what students were doing (and not doing) successfully. So I started assigning standards to each problem within a problem set. Then I had to look into a student’s score on each problem individually, which slowed things down considerably. Plus, what should I do if (as was common) different problems addressed the same standard, and students succeeded on one, but not another? This didn’t really solve the data entry or formatting issue, either.

What I eventually realized was that no choice I could make would be good enough, because I was missing a key ingredient: Seeing what the student was thinking while solving the problem. For a while, I tried having students use a paper notebook to do scratch work for online homework, which I would collect and grade. But at that point, I was basically assigning traditional paper homework and having them enter it online – so why bother with the online system at all?

What I do now: The last few times I’ve taught Calculus, I’ve made online homework into purely optional practice. For each computational standard, I set up one online homework problem set focused only on that standard. I make it available as soon as we’ve covered that standard in class. Students can use it at any time to practice, before or after completing other assessments. This is where online homework really shines: It can provide many and varied practice problems that give immediate feedback with chances to try again right away, and students can pick and choose the practice they need.

The problems don’t count in the final grade at all. Instead, I use quizzes, exams, and other written assignments to assess standards.

But – and this is key – this online practice homework also serves to “unlock” reassessment attempts. If a student wants to reassess a written assignment – such as an exam problem covering a specific standard – they must first show evidence that they’ve completed relevant practice problems. Online homework is an ideal way to do this. I usually require a 90% score on the online practice problems. That’s a high enough bar to show good faith practice, and if students have trouble meeting that bar, then the problems provide great material for an office hour discussion.

This has worked very well. It makes the “optional” online practice problems relevant to students and encourages them to use them. They can directly see their relevance. In turn, this practice and even metacognition helps make reassessments, on average, much higher quality. Overall, this approach incentivizes practice when it’s needed, without forcing busywork on students who don’t need practice.

Mixing standards and specifications

For many years, I combined both standards and specifications within the same assignments. I used this hybrid system on written assignments, such as “challenge” homework problems and mathematical proofs. In these, I cared both about individual skills (assessed with standards) and overall writing and communication quality (assessed with a list of specifications). Students would earn individual marks on each standard, plus a single overall mark for meeting communication specifications. I thought that this was quite a nice way to emphasize the importance of both individual skills and communication quality.

After using this for a few semesters, I had a nagging feeling in the back of my head that things weren’t quite as great as I wanted them to be. On the one hand, this hybrid system was confusing for students. As I wrote last winter,

Students often found all of this confusing: Their “grades” for a proof involved both a list of marks on standards, plus a single mark for holistically meeting the specifications, even though the specifications themselves are also a list that sure looks a lot like the list of standards.

On the other hand, having both standards and specifications actually made it harder for me to grade – much like the “standard salad”:

The interaction between standards and specifications in the same piece of writing gets tricky: Is this error a communication issue (specifications)? Does it reveal a problem with mathematical content knowledge (standards)? If so, which standard? Proofs bring multiple ideas together, making it hard to pull apart exactly which standard is met or not. If part of a proof is confusing, is that a writing issue, does it reveal an underlying logic problem, or both?

What I do now: In my “new year’s resolutions” post from last winter, I resolved to separate out standards from specifications. I redesigned my assessments to use standards only when assessing individual, discrete skills, at a fairly basic level. This usually happened on short quizzes, which students could reassess via new attempts on future quizzes. When I wanted to see students put those skills together in higher-level ways while demonstrating clear communication, I used (only) specifications, with one of the specifications being “Have no important errors or omissions in mathematical reasoning or justification”. I implemented a revision cycle, so that students could show growth on the same piece of writing.

This worked extremely well. The system was a lot clearer and simpler. Students had much less confusion about what was being assessed, and where any trouble was located. My grading choices were much simpler, since I didn’t have to try to untangle skills from communication. And I was still able to give feedback in terms of the relevant skills involved on writing assignments, if that was where students were having trouble. Overall, I learned how important it is to identify the main purpose of an assignment – skills? synthesis? – and pick a grading method that fits that purpose.

Keep it simple

If there’s one general lesson to be taken from each of these mistakes, it’s the importance of keeping things simple. We say that a lot on this blog, but you can see that I have trouble putting it into practice.

Each of the problems I described above was basically an over-complicated approach to alternative grading. I was trying to find ways to fit square pegs into round holes, and trying to do too many things at once. It took me many years to recognize that some of these approaches weren’t working, and that a simpler approach would be better for me and for students.

Don’t feel bad if you too have tried alternative grading ideas and found that they don’t quite work for you. You’re in good company! My best advice is to constantly engage in your own feedback loop: Reflect on what you’ve tried, be honest with yourself, and aim to simplify whenever possible. And, of course, feel free to learn from my mistakes so that you don’t make them in the first place!

I’ve stopped using that standard, too. It was always one of the very hardest for students to demonstrate, and it wasn’t actually related to any course content. It was more of a way for me to say “this isn’t perfect, but the errors aren’t important” which is just kind of disheartening without actually being useful.

I say a lot more about this in What does it mean to meet a standard?

Autar Kaw

Do this with final exam. Since questions are comprehensive, cumulative and at higher levels of Bloom's taxonomy, they represent several standards.

I assign a part of the "grade" to the standards met. I have written a VBA program to incorporate all the nuamces. Gave a separate excel sheet with VBA program to students as well. They can track their own exact standing. Canvas tracks the minimum grade they could make if they did not take retakes or scored lower in retakes.

Final exam is also a standalone grading component as it is a pre-req to several other courses.

Only half of the difference (if positive) in grades can be retrieved to be added to the "standard" score so far. They get one chance before this for a retake test on each standard with same retrieval policy.

I use topics as standards. I have eight topics - if I did standards, there will be at least 30. Doing this for 80-120 students would become ridiculous and inequitable (got a lot of students working 20hrs/week).

Our paper is under review and will write a blog for application when accepted.

Expand full comment

2 replies

2 more comments...

Grading for Growth

Discussion about this post