Automated Grading

It’s not just for Scantrons any more.

First, a few recent news items along these lines:

The last link is a little different – it’s about predicting how well students will do in courses based on their past performance. Just consider that a little automated student advising.

I think that every English teacher in the world would like to have some sort of automated paper-grading machine, which takes in student papers and spits out corrections. Now that the world is getting closer and closer to that, we’re suddenly frightened of it.

I understand the concerns. First, as teachers and professors we want to believe that our feedback and input is important, which it is. If we let a machine take that job from us, it feels bad – we feel inadequate and easily replaced – but in my mind that’s just a misunderstanding of what’s happening. More on that in a second.

Second, as someone who gives questions that say things like “derive,” it seems like it might be difficult to do well. As things become less formulaic, in fields like English and History, it feels like it would be harder and harder to get a computer to “correctly” grade papers. Certainly we can train TAs to grade derivation problems faster than we can train a computer to do it. As evidence, I present the past hundred years of education, where TAs graded derivation problems and computers did not. Developing computer grading systems is hard.

Testing such systems, however, is really easy. It works like this:

  1. Have a bunch of human graders mark up a set of papers, rating them on a set of criteria as one would with a rubric.
  2. Create the “inter-rater reliability” score: how well did human graders agree as to the quality of such papers on the chosen criteria?
  3. Have a couple computer grading programs rate the same papers on the same criteria.
  4. Create the inter-rater reliability score for the set of computer and human graders together. If your reliability increased when you added in the computers, then your computers not only agree on what the scores should be, but they can find those scores more reliably than human beings.

If you want to say that the computers might calculate numbers but can’t give nuanced and helpful feedback, you first need to convince me that your students read your feedback.

Here’s when the misunderstanding comes in: when educators confuse writing comments with connecting to students. I’ll believe that, until real AI comes around, students will need a human being with whom they can connect. However, you’re not doing that with your red pen. (Or even a green one.) You do that in the classroom, in your office hours, in the lunchroom or the dorm, in the gym, in passing interactions and long sit-down talks. Teachers and even college professors can connect meaningfully with their students. I absolutely believe that it can be beneficial for both parties; formative, even. But that’s not what giving grades is for.

Most importantly, in order to reject computer grading, I would have to accept the importance of grading in general – which I don’t.

Putting numbers on papers was never our job. Educating other human beings is our job, has always been our job.

I’ll be glad to shuffle this massive stack of papers into the grade-o-matic and let it spit out pointless numbers, while I work on inspiring my students. At least the grade-o-matic will be more reliable than having me score student work when I’m low on sleep.

The day will come when we can get computers to reliably read in papers and spit out important and useful information about them. Information that lets us as teachers talk to the students one-on-one in an informed manner about how to improve their work. That day will be awesome. That day is not yet here, but the day of automated grading is apparently very close.

Edited to add: 

Just yesterday an article came out in the Times with Les Perlman of MIT trying to find a good way to cheat  auto-graders. Here’s a great line from the article:

“He wants to show why it doesn’t work,” said Peter Foltz, a Pearson vice president.

“Yes, I’m a skeptic,” Mr. Perelman said. “That’s exactly why I should be given access.”

As one would expect, the automated graders don’t understand things like context and factual accuracy yet. They do, however, understand grammar and sentence structure. Some of it comes down to whether you’re intentionally trying to fool them, which of course many students will.

It would be interesting to see a two-grader setup, with an expert checking for facts and sensibility, and then handing the paper off to a computer to assign numbers.


About Colin Fredericks

By day I help to create online courses at HarvardX. By night I write roleplaying games.

Posted on April 23, 2012, in Uncategorized and tagged , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: