Grading Software Fooled By Nonsense Essay Generator 187
An anonymous reader writes "A former MIT instructor and students have come up with software that can write an entire essay in less than one second; just feed it up to three keywords.The essays, though grammatically correct and structurally sound, have no coherent meaning and have proved to be graded highly by automated essay-grading software. From The Chronicle of Higher Education article: 'Critics of automated essay scoring are a small but lively band, and Mr. Perelman is perhaps the most theatrical. He has claimed to be able to guess, from across a room, the scores awarded to SAT essays, judging solely on the basis of length. (It’s a skill he happily demonstrated to a New York Times reporter in 2005.) In presentations, he likes to show how the Gettysburg Address would have scored poorly on the SAT writing test. (That test is graded by human readers, but Mr. Perelman says the rubric is so rigid, and time so short, that they may as well be robots.).'"
most schools ignore sat essay (Score:4, Insightful)
Re: (Score:3, Funny)
Your post tells me that you didn't score all that well on the SAT. Bad grammar, incoherent thoughts.
Re: (Score:2)
Sounds like someone who thinks he's smart who got a low score...
Re: (Score:3)
basic math = rote memorization
Yup, it sure is, and sadly this is contentious. Basic numeracy is impossible without memorizing tables for addition, and multiplication. Seen a modern math textbook that shows what buttons to press on the calculator? Seen the recent "common core" controversy about quite crazy approaches to basic math that seem motivated by avoidance of memorization (it's the revenge of new math!). Sigh. But then, do they allow calculators on the SAT?
Re: (Score:3, Insightful)
Re: (Score:2)
Wow, they sure didn't allow calculators when I took it, not very long before that. Didn't slow me down. Couldn't use a calculator in high school physics either.
I don't know all of my multiplication tables,
Well, at least innumeracy isn't new? Eesh. I mean, I can understand not memorizing log tables with the rise of calculators, but can you really not tell from a glance at the unit price what 6 of something cost?
Re: (Score:2)
6 of something is pretty easy.
Back in grade school we had to learn multiplication tables up to 20.
17 of something is a lot harder.
Re: (Score:2)
Re: (Score:2)
Multiply 137 by 6 and place a decimal seperator between the third and second least significant digits.
600+180+42=822, which means 8.22.
Probably faster than getting a calculator if you're not sitting next to one (admittedly, it doesn't happen oftne these days).
Re: (Score:2)
1.37*10 = 13.7
13.7 / 2 = 7-
7- + 1.37 = 8+.
The key to fast multiplication is bit shift, then add or subtract.
Re: (Score:2)
Re: (Score:2)
I don't either, but I can start mentally singing the "Alphabet Song" at the letter J, then remember the next lyric.
Re: (Score:2)
Re: (Score:3)
This is, of course, rubbish. Memorising results for common operations saves time when performing them, that's all.
In Real Life, if you need numerical answers often, you use a calculator. It's faster and less error-prone. And if you don't need numerical an
Re: (Score:2)
You can use audiobooks as a crutch for functional illiteracy, but it's better to know how to read. You can use a calculator for simple arithmetic as a crutch for functional innumeracy, but it's better to have the skill of simple math in your head.
These are a couple of the most basic, simple skills in civilized society. They don't take much time and effort to learn, they're rewarding throughout life, and there's simply no excuse for not having these skills. It's just the worst kind of lazy.
Re: (Score:2)
Attacks like those don't do a damn thing. I fail to see why people are so obsessed with making random guesses about others, as if it's relevant.
So the views and opinions of the "other side" are irrelevant? That's a little heartless and understanding someone's viewpoint helps understand their views. That you don't understand this indicates you need more rote memorization to make you smarter.
Re: (Score:2)
To me, it's no different than randomly spewing forth a bunch of things you think are facts. Even if they are actually true, it's irrelevant. I don't need to understand what you think my SAT score was to understand your views about rote memorization.
What are my views on rote memorization? Do you know, or were you too busy attacking me to even bother to stop and understand.
Re: (Score:2)
The thing that is rote memorization is spelling, as the English language is not phonetic despite being taught that way
Not really. You can tell from the phonology of a word what language it comes from and once you know that the spelling is usually pretty regular. Almost every spelling irregularity in English is a result of the word's language of origin at the time it was borrowed.
Example: Words from French follow "i before e, except after c" (niece, achieve, ceiling, receipt, etc. are all from French) Words from Anglo-saxon do not (weird, neighbor, freight, eight, etc are all from Anglo-Saxon).
Re: (Score:2)
"Rote memorization != intelligence [...] Someone with terrible grammar can be far more intelligent than some worthless rote memorization monkey."
"...For to be possessed of a vigorous mind is not enough; the prime requisite is rightly to apply it."
René Descartes "A Discourse on Method"
"Rote memorization" is what fills your mind with useful items to play with, without them, you well may possess a "vigorous mind" but you won't be able to rightly apply it.
Re: (Score:3, Insightful)
Odd you choose math as an example, a subject where your grammar must be perfect or what you've written is nonsense.
Re: (Score:2)
A lot of math errors are easily recognized and eliminated by a careful reader too. Grammatically incorrect math or English is often readable. It just takes more effort, shows that the writer is either sloppy or unskilled in the use of the language, and, if the former, doesn't have much resect for the reader.
Of course, people who sit in class thinking "what a boring bitch" probably have a general lack of respect for their fellow human beings.
Irrelevant (Score:3, Insightful)
Re: (Score:3, Funny)
You can just stand down from all that meritocratic whinging right now, mister.
Re: (Score:2)
That's when I wish I had a SAM for such occasions...
Re:Irrelevant (Score:4, Interesting)
At least helicopter daddy and blackhawk mommy give a shit about the Precious. Or do you prefer the absent daddy and welfare mommy? People DO go overboard... but I feel like the pendulum is starting to swing entirely too far the other way.
Re: (Score:2)
Re: (Score:2)
Now that's an interesting choice of words...
Re: (Score:2)
What we have here is a superposition of extreme pendulum states.
(Actually, no: we simply have multiple pendulums at opposite extremes. However, calling it a "superposition" is more fun!)
Re: (Score:3)
Re: (Score:2)
Hiring by big companies for internships and recruiting for new college hires are usually filtered by GPA before any engineer or manager sees the stack or resumes. I never had a company care about my grades, but then I was never actively recruited at college age, and so my first dev job was remarkably exploitive, not one of the good ones.
Beyond one's first full-time job, I can't imagine grades ever coming up.
Re: (Score:2)
If you have enough applicants and only a few positions, you're better off taking the best performers who also got a 4.0. In the extremely competitive internships and fellowships, you can afford (stats-wise) to target only the best tail of the distribution and the outliers.
Re: (Score:2)
Stupid or not, it's how the world works. You have to filter a horde of students down to a pile of resumes small enough to go through by some criteria simple enough that an HR drone can manage it, while fulfilling a bewildering array of state and local requirements for fairness in hiring.
Heck, at the bigger shops today, managers and engineers don't even get involved in the process for interns/NCGs until the interviews start - no voice in the filtering process at all. Diversity suffers, otherwise.
Re: (Score:2)
Do you actually have hard evidence for this, or is this prejudice? I'd really hate to think about going through the hiring process for an employer who looked down on me because I did too well in school.
To generate the keywords takes knowledge (Score:2)
Since the essays are grading subject knowledge, and it takes subject knowledge to provide the keywords, it is fairly irrelevant if the essay happens to be structured in a manner that is nonsensical.
Demonstration of deeper understanding, if it needs to be tested, can be achieved via other types of questions.
Re: (Score:2)
The next generation of the software will have a keyword database attached for every subject possible to ensure that every student takes different keywords (chosen randomly from the stock).
Then your grades are pretty much dependent on whether the random number generator chooses keywords that the grading software likes.
I fail to see the difference to now, to be honest, it's just way less work on the student's side.
Re: (Score:2)
Did you happen to read TFA? In the TFA, it is said that the College Board does not take points off for factual errors. In fact, it says that it cares not for factual errors, because errors in fact seldom subtract from the quality of the essay being graded.
WTF, right?
Re:To generate the keywords takes knowledge (Score:5, Insightful)
How on earth did you guys let it get so ridiculous??
Re: (Score:2)
How on earth did you guys let it get so ridiculous??
Never underestimate the power of Intelligent Design.
Re: (Score:2)
Well, not that it's much of a defense, but absolutely no one that I know of took the SAT essay section seriously. I have not heard of any university that actually considered that section of the SAT when making admission decisions. So our education establishment wasn't completely stupid, I guess.
Re: (Score:2)
so the Republicans want to close all public schools
This right here is why the USA is fucked. People just can't speak coherently about politics any more. (And, of course, it's the teacher's unions who are forcing high-performing charter schools to close, but that's hardly part of the Dem platform, just a consequence of seeking the public sector employee vote).
Do you actually think either party has a goal other than the best schools? The disagreement is over how to achieve that, and I've never heard anyone arguing for no public schools, and more than for n
Re: (Score:2)
Corruption is rot that spreads throughout - it does not care to limit itself to one species of host.
I always find it amusing when people try to blame only one source for this sort of thing.
Re: (Score:2)
Re: (Score:3)
Do you actually think either party has a goal other than the best schools?
Yes. I honestly believe that the Republicans want to disband public education and have a merit-based entry to private schools (parent's merit, not children's), paid for with taxpayer dollars. It's "revenge" for having forced them to educate the poor for so many years.
I've never heard anyone arguing for no public schools,
I have. Charter-schools and for-profit private schools only, and they would be banned by law from having unions and could reject children from admission for arbitrary reasons (including race and religion).
I've gone to plenty of party meetin
Re: (Score:2)
Re: (Score:2)
Both parties fear educated voters.
Re: (Score:2)
So, not just the usual worker hating, democracy hating fascism, but now with some crazy BS pulled out of right field.
Re: (Score:2)
Try harder. [lmgtfy.com]
Re: (Score:2)
...And (perhaps) a disagreement over what "best" means. I say "perhaps" because, to at least some extent, both parties agree that "best" means something entirely different than what most parents would expect (i.e., "an obedient assembly-line worker with no critical thinking skills").
Re: (Score:2)
No one wants an obedient assembly-line worker with no critical thinking skills any more. For nearly 100 years those were the best jobs most people could hope to get. The schools were genuinely doing people a service by training good manufacturing workers. But that ended decades ago, and critical thinking is all the rage now. The problem is inertia, and the people on both sides who want to cram kids heads full of stuff that's not particularly compatible with critical thinking (but then, the people who wa
Re: (Score:3)
The right wingers here (and their ex-currency trader, cheesy smiled leader) have been trying desperately to beat on them but NZ has one of the best bang for buck education systems in the world. (i.e. Our teachers are not paid that high but the performance indicators are in the top grouping.)
Just wanted to mention that for the inevitable people who will read your comments and think "unions baaaad
Re: (Score:2)
In the US, "unions are bad" has been given as a reason to close all schools because it's better for the children
You don't need software (Score:4, Insightful)
... because Slashdot shows that humans already make evaluations about articles without reading them.
Quid pro quo (Score:4, Insightful)
When you're too lazy to read my essay to grade me and let software do it, I don't really see no moral problem with doing the same to write the essay.
Re: Quid pro quo (Score:2)
> I don't really see no moral problem
I guess someone should have graded your essays a little more closely instead of relying on a robot.
Re:Quid pro quo (Score:5, Insightful)
As someone who graded hundreds of essays while serving as a teaching assistant for a senior-level engineering ethics course, I have to say that I find your lack of integrity rather appalling. Your moral obligation to write the essay yourself is independent of the method they use for grading it. Just because someone else is doing a lousy job does not mean that you suddenly have a license to short-change them for what you're obligated to do.
I would guess that I graded around 300-400 essays during the three semesters I served as a TA, and that I probably averaged around 20 minutes per essay, since I was a strong believer in providing useful feedback over things the students could improve, even if they weren't necessarily incorrect. That said, other TAs spent as little as a minute or two per essay, and barely provided any feedback at all. Regardless of how much time the TAs did or didn't spend on the essays, however, the students had the same obligations, and rightfully so.
Let me put it in Engineering terms... (Score:2, Insightful)
If I've been hired to build a Potemkin village, then it would be unethical of me to spend time constructing interiors for the buildings.
The English department has some nice courses on compositional writing where I can get real feedback on my progress on those skills. As far as the machine-graded essays for any other Department -- either I understood the topic before writing the essay or I didn't and if I didn't then a no-feedback essay isn't going to fix the problem.
Re: (Score:2)
If I've been hired to build a Potemkin village, then it would be unethical of me to spend time constructing interiors for the buildings.
Not unethical, idiotic.
Comment removed (Score:4, Insightful)
Re: (Score:2)
Students pay big bucks and expect to have experts in the field teach them and grade their work.
What has one got to do with the other? I'd rather have a fantastic teacher from whom I learn and receive zero feedback and a token passing grade than a shithouse teacher who makes me study for the test which I pass with flying colours and was well read.
Not all effort is equal put in by both students and teachers is equally valuable.
Re: (Score:2)
So if I feel I'm being shortchanged, I'll just not do the work, ensuring that I don't get the education I'm paying for. That'll show 'em!
Re: (Score:2)
That's an interesting claim. I'd be curious to hear you make an argument to support it.
Re: (Score:2)
"I have to say that I find your lack of integrity rather appalling."
Unfortunately, engineering ethics is something that is normally taught in the undergrad level. With the onslaught of international graduate students and H1-B workers, engineering ethics become a "luggage" for competitiveness om the domestic student or workers to compete with these people who treat plagiarism as honorable activity. Any person with integrity will lose out to those who has no bottom line to achieve the goals.
Re: (Score:2)
As someone who graded hundreds of essays while serving as a teaching assistant for a senior-level engineering ethics course, I have to say that I find your lack of integrity rather appalling.
As someone who served on the IEEE ethics committee I find your appeal to argumentum ab auctoritate [wikipedia.org] rather appalling. You should know the distinction between ethics and morals. One could make the Utilitarianist [wikipedia.org] case, in which (arguably) the behavior cited is morally OK. One could also make the Kantian [wikipedia.org] argument that (arguably) comes closer to what you were condoning.
Regardless of how much time the TAs did or didn't spend on the essays, however, the students had the same obligations, and rightfully so.
As an assignment for your ethics class: please elaborate, under which ethical systems, the above statement holds true or not, and why.
Re: (Score:2)
As someone who served on the IEEE ethics committee I find your appeal to argumentum ab auctoritate rather appalling.
Fair enough. Truth be told, I was merely trying to mirror his opening statement by providing a contrast from the other side. It was not my intent to use my former position in such a manner, though I can certainly see how it comes across that way. The fault lies with me on that one. I should have been more careful.
You should know the distinction between ethics and morals.
I do, though depending on what systems we're talking about, that distinction evaporates.
As an assignment for your ethics class: please elaborate, under which ethical systems, the above statement holds true or not, and why.
I'll admit, I've always leaned quite a bit more towards deontological approaches to analyzing situations in m
Re: (Score:2)
Excellent reply. A+. ;-)
I, of course, absolutely agree with your original statement, but I also think the GP wanted to point out the much more important ethical aspect: should we build and use machines for something that is such a profoundly human activity, i.e. the communication and exchange of ideas? Taking the Kantian approach here as you so eloquently pointed out in your post: Since I as an essay writer (and reader, FWIW) expect to communicate with humans, using machines for either or both of these tas
Re: (Score:2)
Hah, thanks!
And I think a lot of that gets back towards the purpose of the essay. I'd suggest that we write essays for our courses, not for the purpose of communicating, but rather for the purpose of improving our communication (a subtle, but important, distinction).
If it's possible to write software that can capably analyze how skilled we are at communicating, I'd have trouble coming up with any objections to using it, given that it could successfully serve the same purpose as the human grader. That said,
Re: (Score:3)
The problem is that technology allows universities to take short cuts in education, and not in the students advantage. Add to that some of the current goings on in the university system, and the future of the education system is a little worrisome (then again the future has always been worrisome and somehow we've muddled through).
But, while before you might have a few bad apples not providing sufficient feedback to students (or not doing it in a useful way) you have, as matters of policy, short cuts.
Why pa
Re: (Score:2)
"As someone who graded hundreds of essays while serving as a teaching assistant for a senior-level engineering ethics course, I have to say that I find your lack of integrity rather appalling. Your moral obligation to write the essay yourself is independent of the method they use for grading it."
No, it isn't.
Once you failed on your end of the contract (in this case, that you will do a serious attempt to grade my intimate knowledge on the issue by using experts to review my work) you shouldn't hold any assum
Re: (Score:2)
Re: (Score:2)
That is highly questionable, but let's start with a simpler one: on what basis would you have such a moral obligation in the first place? Simply because someone who has power over you said so?
Go on, don't leave us hanging: "rightfully so, because..."?
Re: (Score:2)
Just because someone else is doing a lousy job does not mean that you suddenly have a license to short-change them for what you're obligated to do.
TAs spent as little as a minute or two per essay
Read what you just posted. Then read it again.
The only person being shortchanged in this case is the student who is actually footing the fucking bill for an education. If the education is a fraud because grading is done on whim and in a slapdash manner, which is what you are describing, then what is the fucking p
Re: (Score:2)
Comment removed (Score:4, Interesting)
Anonymous exams (Score:3)
Racism, sexism and other discrimination is quite effectively countered with anonymous grading. My university gave you a unique number before each exam and you put only that number on the sheets. Only afterwards did the administrators (not anyone involved in the course) look up and file the exam under your name. I found this helpful as a TA too because we really wanted to be fair both in grades and comments.
You can still be biased by the handwriting but we tried to counter that ourselves. If someone in my TA
student athlete need some like this with 60 hours (Score:3)
student athlete need some like this with 60 hours a week playing football they don't have time for class.
Can probably "write better" too (Score:2)
If they're using some stupid automated grader, odds are a computer-generated essay could consistently grade higher than any humans (because it can focus on scoring without worrying about content).
The answer: essay grader graders (Score:3)
I don't see a problem with automated essay graders in principle. It's just that the current essay graders are no good. Once we are able to make computer software that can actually understand essays as well as a human it will be should be perfectly competent to grade an essay.
I certainly see the motivation to have a computer grade essays. Who wants to read multitudes of mediocre essays. I might rather be put in solitary confinement. I am all for the automated essay graders, but only after they can be proven to be as competent as a human.
I have no idea how to make a such a competent essay grader, but I do know how to grade an essay grader. You have a bunch of computer graders and human graders grading the same essays. If the computer graders show a more consistent performance than the humans (i.e. are the outlier less frequently), then the computer grader is better.
If a paper is scored by 4 human judges and a computer, and the humans score the paper 1, 2, 3, 4, and the computer scores the paper as a 9, then it means that according to most of the human graders, the computer was way off. Essays are inherently subjective. Are the humans right or is the computer right? Who cares it doesn't matter.
If a paper is scored by 4 human judges and a computer, and the humans score the paper 4, 5, 7, 9, and the computer scores the paper as a 6, then it means that according to every human grader, the computer did better than half the humans.
If a computer can do better than the humans even by human standards, then I think it's fair to say that a computer is good enough.
Re: (Score:3)
Re: (Score:3)
If it becomes the case that writing style is able to be analyzed and produced by a computer algorithm, it seems to me that having a good writing style will become like having good arithmetic skills (i.e. less importance is placed on these skills as they become trivial for machines to replicate), and ironically this ability to automatically test and reproduce skills drives those very skills into obscurity.
It seems like the skills that computers can't do yet are the only ones that it is worthwhile for humans
Re: (Score:2)
BZZT! Wrong. Thanks for playing. We were specifically talking about writing style not content. Grammar is predictable which is why your word processor can help to correct you. Try reading the actual content of the post next time.
Also saying shit like "BZZT! Wrong. Thanks for playing." just makes you look like a pretentious asshole.
Re: (Score:2)
"I don't see a problem with automated essay graders in principle."
I don't see a problem with automated essay creators then.
"Who wants to read multitudes of mediocre essays."
Nobody. That's why they attach a paycheck by the end of the week to that activity. If you think that's not fair, you can forego your paycheck at any time.
"If the computer graders show a more consistent performance than the humans (i.e. are the outlier less frequently), then the computer grader is better."
ON AVERAGE. It happens that it
Re: (Score:2)
Nobody. That's why they attach a paycheck by the end of the week to that activity. If you think that's not fair, you can forego your paycheck at any time.
I think you missed my point. It's also boring to calculate logarithms by hand. Before we had digital computers, skilled human computers (usually women) were paid to tediously do this work. It wasn't fair or unfair. It was a waste of human effort to do something so tedious. With the advent of computers, that human effort could be spent on much more interesting things, like programming computers to perform more tedious tasks.
ON AVERAGE. It happens that it is the outstanders the ones that have more potential and you are just conciously throwing all them by the bathtub.
If a computer can score an essay between where all the human graders scored the
One human and a computer (Score:2)
The solution might be to have a human sanity filter checking semantics and throwing out gibberish, and a computer grader doing the fair grading.
Works on Slashdot posts, too! (Score:5, Insightful)
Artificial intelligence, while seemingly tasty on the surface, tends to be underwhelmed by insufficient fish, with regard to warrantless searches.
Re: (Score:3)
AI is HARD. Plenty of tasks which people can do easily are difficult to get machines to do, even throwing lots of processing resources at the problem.
Natural Language Processing is one of these difficult problems. With "grading essays" also being nowhere near beginner level NLP.
Quite possibly actual NLP experts would not attempt to write such software, because t
Obligatory Offal (Score:5, Interesting)
A modern Richard Guindon cartoon that best represents this Slashdot story [guindoncartoons.com] ... an urban legend [snopes.com] ... [1998, archived] essay on teachers' and students' increasingly virtual role in a tech society [archive.org] ... a mad hunt for the original 1963 New Yorker cartoon that started it all [cs.ubc.ca] ... and an ugly mouse squeak toy [sweetcouch.in].
I have heard the machines singing... (Score:2)
Each to each.
I do not think that they will sing to me....
Is the essay generation software available? (Score:2)
Re: (Score:3)
Looks like many comments on political forums (Score:2)
Example from article: "Privateness has not been and undoubtedly never will be lauded, precarious, and decent.". There are too many comments on news sites which read like that.
MIT Essay Generator Leak Detected (Score:2)
Quick! Where's the German version? (Score:2)
Quick! Where's the German version? I need to boost my sociology grades!
Seriously, the first thing you have to thouroughly disable when doing sociology is your brain and any sense of logic or common sense in it. The bizar bullshit that is put out in this field even at academic level is mindboggling. The blatant non-sense that's in the books and readers of this subject is unbelievable. ... I need that generator to keep my braincells from killing themselves to end the agony.
Re: (Score:3)
It sounds like the software would be perfect for writing audit reports. You hand in a phone book sized report, but all they ever read is the management summary.
But DARE to hand over just the relevant pages that you know will get read. Did you work at all, if THAT is your whole report?
Re: (Score:3)
Your right you are encouraged to write long documents, but it should really be the opposite, writing is about communicating, if your document is so long that people don't bother reading it, the document has failed in its main purpose.
This standard should be applied to legal documents, such as License agreements, Insurance agreement, What your ELA is more than 100 words long, you don't expect anyone to read this do you? Agreement Invalid. If you need longer it should ensure that people understand what they a
Re: (Score:2)
This standard should be applied to legal documents, such as License agreements, Insurance agreement, What your ELA is more than 100 words long, you don't expect anyone to read this do you? Agreement Invalid. If you need longer it should ensure that people understand what they are agreeing to, maybe run a 1 year course of something.
You do realize that even the 3 clause BSD licence is more than double that with 220 words? And that one basically says "do what you like, but we take no responsibility" and if you have a license that actually tries to say anything like the GPL 3.0 it is 4632 words, not including the preamble or how to apply. How many years of your life would you like to waste? If anybody cared, we'd rather see the development of "standard terms and conditions" which would be several thousand words long but also widely deplo
Re: (Score:2)
Yes, but they run it through spinbot.com before it becomes official regullations.
Re: (Score:2, Informative)
Reference to (Babel, Tower Of).
The story is a biblical "explanation" of why humanity, despite ostensibly originating as a single tribe, uses multiple languages.
Re: (Score:3)
Reference to (Babel, Tower Of).
The story is a biblical "explanation" of why humanity, despite ostensibly originating as a single tribe, uses multiple languages.
I could be wrong, but I think it's understood primarily as an allegory regarding man sinning(?) by aspiring to accomplish what only God can.
Re: (Score:2)
Re: (Score:2)
My automated modding bot categorizes you as "troll", so there!