Small changes to handle AI
What I did this semester to address AI use in one of my classes

You probably know what issues I’m thinking about just from this post’s title: AI is unavoidable right now. Generative AI tools are freely available to anyone with an internet connection. In that context, what kinds of assessments can I use to understand student learning?
I have not found any obvious misuse of AI in my classes, but I’ve certainly had suspicions and I’m aware that it could easily be happening. As a result — and like nearly every instructor I talk to — I’ve been thinking about how to improve my assessments to better capture what my students actually know.
This semester, I made a few small but impactful changes in one of my classes that were meant to address AI use. I don’t claim to have all of the answers to how to manage AI and assessments, but I do think that these small changes were helpful. Let’s look at what I did, and how it went.
About my class, and what I didn’t change
In this post I’m going to focus on just one class that I taught last semester: MTH 210, Communication in Mathematics, which is an introduction to proof writing course. I wrote about this course at length this past summer (part 1 and part 2). I’ve taught it a dozen or more times over the past decade. This class comes early in the math major and is typically taken by first- and second–year students. Class topics include both learning individual mathematical skills (tested on weekly quizzes) and synthesizing those skills in written form (assessed through a proof portfolio). Along the way, the class has a significant emphasis on learning about disciplinary culture, writing standards, and tools.
We constantly put collaboration into action during class time, with teams of students working together at tables or whiteboards to investigate problems and draft mathematical proofs. All of this is unchanged from previous semesters, so for more details, see those posts from the summer.
Since about Fall 2023, including last semester, I’ve included an AI statement in my MTH 210 syllabus. Here it is, unchanged this semester:1
Generative AI (e.g. ChatGPT): In this course, we’ll be developing skills that are important to practice on your own. Using generative AI can inhibit the development of those skills – even if you just use it for hints or suggestions. So, please refrain from employing AI tools in this course. Using such tools for any purposes, or attempting to represent AI-generated work as your own, will violate our academic integrity policy. (Side note: ChatGPT and similar LLMs are good at writing nice-sounding but logically incorrect proofs that don’t follow our communication specifications at all. Don’t waste your time.)
In short: Don’t use it, particularly because it can hurt your learning. I also spent multiple short blocks throughout the semester talking with students about my own take on AI and why I included that syllabus statement. I’ll return to what students said about AI use a bit later.
What I changed
In MTH 210, I’ve always used short weekly quizzes that test skill-based standards. Students need to earn a Successful mark twice on each standard. Long ago, I gave these quizzes during 15-20 minutes of class, along with two or three longer in-class exams. During the Covid years, I changed to entirely out-of-class quizzes with long deadlines (e.g. 24 hours). This included the final exam, which was just one more chance to meet each standard. This asynchronous approach was very convenient: it opened up extra class time for learning, and also reduced time pressure on students.
Nonetheless, fully out-of-class quizzes were tempting places for students to misuse resources that would muddy the waters of the assessment, including (but not limited to) AI. But I didn’t want to move all of those quizzes in-class, both because of the test anxiety it would provoke, and because that would be a lot of class time!
Instead, this semester, I moved just a few carefully selected quizzes in-class. Specifically, I moved three “big quizzes”, covering more than the usual number of standards, into class time. These were basically short exams, spaced throughout the semester. They were a chance to earn Successful on a bunch of standards, no different from other quizzes except for their length. I also moved the final exam back to an in-person format. I had previously resisted this since the final would have been the only timed in-class assessment, and that felt unfair. Now with other assessments already taking an in-class format, it was no problem to move the final in-class as well.
All other weekly quizzes remained asynchronous. They could be completed any time between the end of class and the following morning. These quizzes were very short, each about one page long and covering at most two standards.
More importantly, I rearranged the order of the quizzes. Now, each standard appeared only once on the smaller out-of-class assessments. All other attempts on each standard appeared on the larger in-class “big quizzes”. This (generally) ensured that students had to earn at least one of the two required Successfuls on an in-class assessment. If you want to see what this looks like in practice, here’s the Quiz Learning Target Plan that I initially made for myself, and then shared with students to help them plan.2
While not new, the final exam also included a required “recertification” of some “core” standards. All students had to attempt these six most critical standards one more time on the final, whether or not they’ve already earned two Successfuls. The number of Successfuls on this recertification adjusted their final grade with a + or -, with a generous range leading to no grade change. This was one last chance, in an in-person environment, to check in on student understanding of the most important topics.
By only moving three regular quizzes in-class, I didn’t have to find as much extra class time as I might have otherwise done. In the end, I pinched and squeezed just a few parts of two multi-day topics, and removed one block of unstructured in-class work time. That’s all I needed to do – it would have been much harder to move all of the quizzes in-class.
Other assessments remained unchanged, including a proof portfolio. Proofs are long-form written mathematical arguments, and we spent a lot of time in class practicing with these. The actual work of drafting and revising these “portfolio problems” happened entirely outside of class.
While I know that writing is a magnet for LLMs, I wasn’t worried in the case of these portfolio problems. In part, this is because it is simply infeasible to require students to write detailed logical arguments in a high-stakes in-class timed testing environment. That would assess all of the wrong skills.
But there’s more beyond that as well. I heavily emphasized a rough-draft thinking approach to proof-writing that includes, among other things, a handwritten rough draft of each proof that is graded only for completion. I gave detailed feedback and might ask for a resubmission if the logic needs work. As a result, a student moving into formally writing a proof already had – and knew they had – a workable outline of a proof ready to be written. When it comes to formal writing, we used discipline-specific writing tools and discipline-specific guidelines about what “professional writing” should involve, and we practiced with both of those in class. These discipline-specific requirements added another layer of difficulty in the AI front that, in my experiments, made AI use much less practical. It also seems (see below) that students found significant value in this writing and revision process. For these reasons, I didn’t change the proof portfolio.
What I noticed, and what students said
Overall, I was pretty happy with these changes.
You will not be surprised to hear that students found the in-class exams more stressful.3 I hate proctoring in-person exams, not least because of the feeling that I’m inflicting unnecessary stress on students. This experiment with in-class assessment reminded me forcibly of why I’d gotten rid of them in the first place.4
However, students did about as well on the in-person “big quizzes” as they did on asynchronous quizzes. From reading student work and talking with them, both types of quizzes seemed to give a solid representation of what they actually understood. Some of that undoubtedly came from the fact that the “big quizzes” were usually second attempts: By the time they came to a big quiz, students were farther along their learning trajectory than they were on the initial asynchronous quizzes. This was part of my intent in arranging the quizzes as I did. In an alternatively graded class, all assessments are essentially formative, and I encourage students to keep learning as part of the assessment process. It’s also possible that students saw that a quick Successful with AI help (out of class) wouldn’t help them when they had to attempt the standard a second time in class.
Overall, this seems like a clear win, especially given that I only moved three quizzes (and the final) to be in-person out of 15 total assessments.
The out-of-class written portfolios were similar to past years as well, with all of the same difficulties and successes that I’ve come to expect in the context of this class.
Now let’s hear a bit more from students. At the end of the last two semesters, I’ve given students a totally anonymous AI survey. It asks a few simple questions, the key ones being: How often did you use AI in this class? What for? I had a nicely high response rate (91% across two semesters), and that plus the anonymity suggests that the surveys provided honest student feedback that is reasonably representative.
You might not be surprised to learn that students were using AI in my classes. But not nearly as many as you might expect, and perhaps not in the ways you expect. Across all of the semesters that I surveyed, less than half of my students reported using AI in any way, and of those who did, a large plurality chose “once or twice”.
What I found more interesting is what students actually said about how they used AI. Far and away the most common use case was as a fancy search engine. Many students said that they asked ChatGPT how to type a specific symbol in LaTeX.5 While I think that this is, at best, a mediocre use of an LLM (it’s truly no better than a standard web search), it’s not something that worries me when it comes to student learning.
A sizable number of students also reported asking for general overviews of a topic. Note that this wasn’t “how do I solve this problem?” or “how do I write this proof?”, rather, this came from a desire to understand the topic more clearly. Again, this might not be a great use of AIs, and I worry about subtle errors they might have internalized, but that’s no worse than many of the Youtube videos out there.
The third interesting use case was creating practice questions for quizzes. Students reported asking ChatGPT to quiz them on a topic. I had emphasized the value of self-quizzing and even worked with students to create flash cards, so this wasn’t too surprising. However, I do wonder if this may have led to one issue that I noticed: I regularly asked students to state specific definitions on quizzes. Definitions are at the heart of learning new mathematical ideas, and they are something we practice with in class. More than in previous years, I noticed students giving imprecise or slightly “off” answers when asked to state definitions. While definitions are fairly standardized, they aren’t identical from source to source, and publicly accessible LLMs would certainly not know our specific version of each definition. Perhaps this aspect of studying got missed when students relied on an AI that didn’t know our definitions — or maybe I’m imagining trends where there are none.
Finally, many students drew a clear line between using AI for (as one phrased it) “learning” rather than “doing the work”. In other words, they distinguished between trying to understand a topic, and showing their understanding of that topic. I don’t think that distinction is nearly so clear as students might think, but I did find those students’ intentions to be interesting, and it has given me some ideas about how to talk about AI use in the future.
After all of that, here’s the most important thing that I took away from these surveys: Students are not all of one mind about AI. If you read many of the articles written about college students and AI, you’d be excused for thinking that we live in a free-for-all of AI-based cheating. Another question on my survey asked students what they thought a reasonable AI policy would be for their class. Their answers were wide-ranging, but a substantial number gave ethical, moral, and environmental arguments against using AI at all. Others emphasized that the extra work involved in fact-checking an LLM’s response wasn’t worth the ease of asking it questions. Then again, others sang the praises of their ability to create high-level overviews, unlimited practice questions, or use an AI like a natural-language search engine.
What I’m doing next
I’m definitely keeping this approach to quizzes, and applying it to my other classes. AI or no AI, I think this approach to quizzing struck a good balance of flexibility and practicality. These small changes still required careful planning and rearranging some topics, but they didn’t require blowing up my whole course plan.
To help make this approach last for the long term, I’m simplifying my list of standards even further. I took careful notes throughout this semester, and I identified 5 (out of 18) standards that could get cut or merged into others. This will let me keep quizzes shorter, especially the in-class ones. It will also give me the wiggle room I need to ensure every standard has only one out-of-class attempt.
Looking farther ahead, I need to revamp my syllabus statement about AI use. As you might have noticed, among other things it’s essentially unenforceable. But I also strongly believe in the central idea that inspired it: That learning is a deeply human activity, it’s something that takes time and effort, and that AI can dangerously short-circuit that kind of learning. I will likely create something that includes more specific “dos” and “do nots”, including examples of appropriate and inappropriate prompts or uses. I’m also going to focus more on providing good study materials and explicitly addressing productive study strategies, so that students encounter good ways to study that have a closer connection to our actual course content.
This definitely isn’t the end of the story. But this is the end of the post. So before I sign off, I’d like to leave you with two key thoughts:
First: It’s possible to make small changes that address AI use on assessments. You don’t have to blow up everything and start fresh.
Second: Our students are not monolithic in their views on AI. Their views are as complex as the rest of the world, and we need to keep that in mind when considering our own choices.
I adapted this language from a resource shared by our Teaching & Learning center a few years ago. Doing a search now, similar language appears all over the place, so I don’t know who to cite as an original source for this.
You’ll also see that I wasn’t able to perfectly keep to the arrangement I described above – a few standards had multiple out-of-class attempts because that’s just how the schedule worked out. More on this at the end of the post.
Back in the spring, Robert was wrestling with some similar issues and wrote about them, with somewhat different conclusions. Check out his thoughts here: My AI-driven grading changes and Alternative grading in a test-forward environment.
I was also very generous in building in extra time for exams and being flexible about allowing alternative testing arrangements.
LaTeX is the standard computer language for creating professionally formatted math, and it’s a tool we learn about in this class. It’s also notorious for having a special command for every possible symbol, not all of which are obvious. I’m looking at you, \cup and \cap!

