Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Education Social Networks Technology

Flawed Algorithms Are Grading Millions of Students' Essays (vice.com) 90

Fooled by gibberish and highly susceptible to human bias, automated essay-scoring systems are being increasingly adopted, a Motherboard investigation has found. From a report: Every year, millions of students sit down for standardized tests that carry weighty consequences. National tests like the Graduate Record Examinations (GRE) serve as gatekeepers to higher education, while state assessments can determine everything from whether a student will graduate to federal funding for schools and teacher pay. Traditional paper-and-pencil tests have given way to computerized versions. And increasingly, the grading process -- even for written essays -- has also been turned over to algorithms. Natural language processing (NLP) artificial intelligence systems -- often called automated essay scoring engines -- are now either the primary or secondary grader on standardized tests in at least 21 states, according to a survey conducted by Motherboard. Three states didn't respond to the questions.

Of those 21 states, three said every essay is also graded by a human. But in the remaining 18 states, only a small percentage of students' essays -- it varies between 5 to 20 percent -- will be randomly selected for a human grader to double check the machine's work. But research from psychometricians -- professionals who study testing -- and AI experts, as well as documents obtained by Motherboard, show that these tools are susceptible to a flaw that has repeatedly sprung up in the AI world: bias against certain demographic groups. And as a Motherboard experiment demonstrated, some of the systems can be fooled by nonsense essays with sophisticated vocabulary. Essay-scoring engines don't actually analyze the quality of writing. They're trained on sets of hundreds of example essays to recognize patterns that correlate with higher or lower human-assigned grades. They then predict what score a human would assign an essay, based on those patterns.

This discussion has been archived. No new comments can be posted.

Flawed Algorithms Are Grading Millions of Students' Essays

Comments Filter:
  • Or maybe (Score:4, Funny)

    by cdsparrow ( 658739 ) on Tuesday August 20, 2019 @03:44PM (#59106830)

    The algos are correctly grading by an established criteria. It doesn't know or care what demographic a writer is from. It's just judging the writing. Now being able to trick it with sophisticated nonsense should be a tipoff that it doesn't actually do it's job well, but that's separate different from the main point.

    • *but that's separate from the main point.

    • Re:Or maybe (Score:5, Insightful)

      by gurps_npc ( 621217 ) on Tuesday August 20, 2019 @03:53PM (#59106868) Homepage

      No, they are not grading by established criteria. Algorithms are not sophisticated enough to do that.

      Instead they Do the following::
      1) Grade a subset of papers by hand
      2) Feed the algorithms those papers and grades
      3) Let the algorithms evolve till it's grades match those of humans, allowing them to pick ANY criteria it wants to make the grading decision.
      4) Do spot corrections.

      The algorithms do not have to grade on criteria, the humans never have any idea why the alogrithms generate particular scores. It is perfectly acceptable for them to base their entire grade on the names of the students involved.

      Your ignorance is remarkable in thinking that the algorithms use the same criteria that the humans use. That simply is not how ANY artificial intelligence works any more. Hasn't been for decades.

      • " Let the algorithms evolve till it's grades"

        Look, humans can't even tell its from it's. Besides, you wanted "their".

      • Re:Or maybe (Score:5, Informative)

        by LordWabbit2 ( 2440804 ) on Tuesday August 20, 2019 @05:29PM (#59107184)

        That simply is not how ANY artificial intelligence works any more

        I know it's like beating a dead horse, but this is not artificial intelligence. It's algorithms making choices based from subsets of data. Clearly if you spell correctly and use big words the shitty algorithm rates you higher. Toss in a few keywords needed to relate to the matter at hand and "voila" you have artificial stupidity, that can be fooled. Who would have thunk?

        • by jwdb ( 526327 )

          I know it's like beating a dead horse, but this is not artificial intelligence. It's algorithms making choices based from subsets of data

          This is definitely beating a dead horse, or mayhapse a dead philosopher, but... would you care to prove that the real-world manifestations of your intelligence are not just "algorithms making choices based from subsets of data" running on a biological computer?

        • by Livius ( 318358 )

          But it's on a computer, and it's doing the same thing as at least some human graders.

          • That is not to say the computer is any good. But neither are the humans that mark these tests.

            When doing reviews by experts, the computers do a better job on average than humans.

            And at least the computer is consistent.

            No, the computer does not just look for spelling and keywords. The sort of grammar checking in MS Word gives you an idea of what they look for. Then some common phrasing.

            And no, you do not just feed essays into an artificial neural network and magic happens. There is more to it than that.

            L

        • AI is useful as an assistant. My first steps would be making AI that can optimise existing AI and make proofs, including from existing AIs. I'm not sure a lot's been done feeding neural nets into neural nets or using AI to try to work out proofs for humans to sample.
      • The algos use whatever criteria they are taught from data sets. Assuming they mimic the human scores well and use a large enough data set, then they should grade the same as a human.

        I'm not saying this is a good idea, because it prob isn't. But trying to attribute any sort of demographic slant when that doesn't matter in the first place is stupid. Now, if you want a human to grade certain papers on an easier set of criteria because of background demographics of the writers, then you are asking the auto g

        • I'm not saying this is a good idea, because it prob isn't.

          The alternative is to use human graders for everything. That will increase the cost, make results inconsistent, and in the end, a tired and overworked human likely won't grade any better.

          Maybe we should just get rid of the essay portion of standardized tests. Many schools ignore the essay portion of the SAT and ACT, because the essays encourage formulaic writing and aren't good predictors of future academic success.

        • by quenda ( 644621 )

          Assuming they mimic the human scores well and use a large enough data set, then they should grade the same as a human.

          No, this is NOT how AI systems work. The AI cannot directly evaluate the quality of the paper, it just makes a prediction based on all available evidence.

          In the US, blacks do not perform nearly as well as whites and Asians academically. So the AI system will soon use any data that correlates to race as a part of its scoring system. If the AI is told the name of the student, a black name will result in a lower score, if only slightly. This is not a bug! It results in the AI system scores more accurately

          • If the AI system becomes smarter, and is able to more directly identify the quality of the work

            This makes it sound like an incremental improvement could achieve it, but actually they would have to start over using a whole different class of AI, like an Expert System. That would cost actual money, it would require programmers. Lots of programmers. And a whole lot of topic specialists.

            Here they don't hire programmers, they just hire one specialist computer operator to set up the parameters. And they compare results to human test-graders, so they don't even need topic specialists. There is no way to add

        • The algos use whatever criteria they are taught from data sets.

          They are "trained from" data sets, but they are not taught any criteria at all. That is not how modern "AI" works. At all.

          If you can't comprehend the huge semantic difference between training and teaching then just shut up, please.

          You probably just don't imagine how lazy and do-nothing an application of this sort of "AI" actually is.

          It would never be expected to learn to sort new data in the same way that the humans had sorted the training data. It just doesn't work that way. It will only match the human re

      • There was already an article posted on this. Once you stop verifying the results the AI produces, once let loose you're operating on faith. Even when every effort is made to in good faith produce results comparable to a human effort things can go awry. Ironically human verification alone has many problems.

        The obsession with AI and political correctness is already demonstrating the problem with trainers. Looks wrong? Fix it. Looks right? Ignore it.

        I've already seen a perfect example of this with PC. At
    • by Pascoea ( 968200 )
      The problem is that the "established criteria" had a bias, so the automated testing picked up and amplified that bias. From the article: "The problem is that bias is another kind of pattern, and so these machine learning systems are also going to pick it up"
      • Re: (Score:2, Insightful)

        it's called proper English. me and lots of others who weren't born here somehow had to learn it. should be no problem with US born people learning how to write in proper English

        • by HiThere ( 15173 )

          That's a guess, but the summary make is quite dubious. If a large vocabulary in-and-of-itself is going to raise the rating, then "proper English" is not what's going on.

          For that matter, most of these programs can't really parse a complex sentence or track the proper use of pronouns, much less detect whether paragraphs have been properly separated. ("The paragraph is the emotional unit of writing."--G. Stein.)

        • There's nothing said about proper English. The AI likely grades based on grammatical structure and vocabulary, but how good is it with the quality of the content and the strength of an argument? A one-size-fits-all algorithm does penalize students who write differently, even if it is by choice. Writing is about communication, and sometimes, art. Sometimes that means you should use very simple language, and short, terse sentences, and sometimes the contrary. Sometimes you can bend the rules of grammar (to an
        • Re:Or maybe (Score:5, Informative)

          by whoever57 ( 658626 ) on Tuesday August 20, 2019 @04:53PM (#59107084) Journal

          me and lots of others who weren't born here somehow had to learn it.

          It's a shame that you didn't succeed.

        • Re:Or maybe (Score:4, Informative)

          by kaizendojo ( 956951 ) on Tuesday August 20, 2019 @05:53PM (#59107274)
          >it's called proper English. me and lots of others who weren't born here somehow had to learn it. should be no problem with US born people learning how to write in proper English

          1. The first word of a sentence should start with an upper case letter when the preceding sentence ends with either a period ('.'), an exclamation point ('!'), an interrogation point ('?') or '...'

          2. In a compound subject or object, the pronoun Me must be on the right.

          3. In the sentence starting with "should be no problem" This verbal group is missing a subject

        • by jythie ( 914043 )
          It isn't about proper or improper english, it is about word choices. I could recall years ago working with a teacher who could tell what kinds of books people read by the 'feel' of their essays, and I've worked with linguists who can tell which part of the country you were from or which foreign country you learned english in. All correct, all proper, but they pick up a certain flavor depending on what media they consume. The bias tends to come in where people tend to react more positively to submissions
        • by Pascoea ( 968200 )

          I 100% agree. But that doesn't mean there wasn't an inherent bias in the essay grading that was used to "train" the algorithm. Grading art is subjective. A sentence may be technically correct, but if the grader doesn't like the tone, style, or word choice they are likely to grade it lower. If you get enough of these lower grades together and use it to train an algorithm, the end result is a computer that doesn't like a particular writing style.

          A person from the inner city of Chicago is going to have a di

        • by quenda ( 644621 )

          proper English. me and lots of others who weren't born here somehow had to learn it

          I see what you did there :-)

      • The problem is that the "established criteria" had a bias, so the automated testing picked up and amplified that bias. From the article: "The problem is that bias is another kind of pattern, and so these machine learning systems are also going to pick it up"

        According to the article, the problem is that the algorithm criteria don't match the hoped for criteria (the purported "established criteria"), which is why the human-graded scores differ. In particular, the algorithm and the hoped-for criteria differ in that the algorithm overemphasizes (i.e., is biased toward) "essay length and sophisticated word choice" and de-emphasizes (i.e., is biased against) "grammar, style, and organization".

      • by quenda ( 644621 )

        The problem is that the "established criteria" had a bias,

        No, there is nothing in the article to say that. The problem exists even if there is no bias in the data fed to the AI.

        The problem is simple: blacks do not write as well as others, on average. The AI does not consciously identify anyone by race, but it does learn that people using a certain way of writing are likely to be not as good.
        This learning enables the AI to more accurately grade the papers, but it can mean that the best black kids will have their papers marked down. The AI has no hate, just col

    • by hey! ( 33014 )

      Sure the algorithm doesn't "care" about someone's ethnicity or religion or socioeconomic background, because it's not person; however that does not mean it necessarily gives unbiased results. Excessive sensitivity to irrelevant differences in word choice is a long-standing problem in psychometrics.

      What you want to measure when grading an essay is how well the student used the information given to him, addressed the question posed, organized and expressed his thoughts. Since you can't tell any of this by w

    • Automatic grading of essays is unacceptable. It's obvious to people who submit automatically generated gibberish there and gets good grades. Such a blatant cost cutting measure would be only appropriate in economically troubled country and only strictly as temporary measure due to lack of teachers.
    • Now being able to trick it with sophisticated nonsense should be a tipoff that it doesn't actually do it's job well

      The exams are designed to test qualification for higher education, where writing sophisticated nonsense is an important skill.

    • by ET3D ( 1169851 )

      It's worth reading the actual article to understand the problem better.

  • Well, to begin with, standardized testing is one flawed algorithm. Always has been and always will be.
    • by HiThere ( 15173 )

      No. There are *restricted* domains where the standardized test if appropriate. E.g. it's quite appropriate for arithmetic.

      But there sure is a big area outside of those domains.

  • How many nonsense articles about "racists" computers are we going to have, along with summaries to stories that can simply be described in a few words. IE Centurylink 37 hour outage that affected 911 service was cause by a malformed broadcast storm.

    • but how good is it with the quality of the content and the strength of an argument?

      Forever, someone always has a chip on their shoulder and will blame it one something. Race has always been a prime target.
      I didn't get picked, it's because I'm [select color here]
      Everyone thinks I pick on [select color here] because I am [select color here].
      If it wasn't race it would be geographical location, or whether you like the xbox or the playstation.
      People are full of shit (like the xbox) and don't like to acce

    • by AHuxley ( 892839 )
      Solar roads and an iceberg sub should be a hint...
      Then its "bias against certain demographic groups" when people what cant do the needed work get detected?
  • by gurps_npc ( 621217 ) on Tuesday August 20, 2019 @03:59PM (#59106896) Homepage

    But 90% is not good enough.

    The algorithms learn by example, not by learning the actual rules of grammar. Which means that things that conform to the usual, i.e. derivative work, is rewarded and everything non-standard is punished.

    If you are in the top 1%, this system punishes creativity. Your work stands out from the standard work which is ALWAYS punished by the modern algorithm methodology..

    it also fails to work well for about 9% of the population that are above average intelligence but have substandard teachers. They end up teaching themselves, rather than being taught by their teachers. As such, their work stands out from the standard work and again, this is ALWAYS punished by the current system.

    • What makes you think the computer does not know rules of grammar? Every used MS Word?

      And what makes you think that tired, poorly paid, unsupervised human markers do a better job than the computer?

      And where did you get 9% from? About 51% of the population are above average intelligence, and about 60% have substandard teachers, which gives about 30%.

      • by Cederic ( 9623 )

        What makes you think the computer does not know rules of grammar? Every used MS Word?

        The rules of grammar include knowing when and how to break or disregard them. Something MS Word is fucking terrible at.

  • by DanDD ( 1857066 ) on Tuesday August 20, 2019 @04:08PM (#59106928)

    Some of the most influential writing in human history has been full of grammar and spelling errors, yet such writing managed to record and convey significant ideas.

    Removing an often biased grammar Nazi from the grading process is probably a step in the right direction for many students, but having no human feedback is the epitome of dehumanizing, as are standardized tests - except for the privileged few for whom the tests are designed.

    This is why students should feel justified in using an 'essay app' to generate 'perfect', unique essays.

    • by rastos1 ( 601318 )

      Removing an often biased grammar Nazi from the grading process is probably a step in the right direction for many students

      The result is that then the students produce stuff that is barely legible and claim "it's the idea that is important, not the form!". Well, if someone wants to convey an idea, then he should not put an extra burden of deciphering the text on me. If he does not bother, then neither will his audience.

      • by DanDD ( 1857066 )

        I understand your sentiment, and I mostly agree. Education is important, and I'm not advocating that we give lazy people a pass. However, the burden of 'deciphering a text', especially if the content is novel or complex, or if it comes from someone with a very different world view, will always fall on you as meaning and implications are rarely clear or direct. Deciphering babel is not what I'm advocating.

        Racism is often thinly veiled by standardized tests and a 'pass-fail' mentality. Taken to it's ultim

  • It's a flawed algorithm, so hack it just like the resume sorting algorithms. Load it up on words that get the most weighting, and reap the rewards.
  • some needs to try sql injection / divide by zero stuff in the tests.

  • Suppose I make a lot of spelling mistakes because English isn't my first language.
    Suppose I don't spell well because my education was sub par and I was more concerned about being shot in school then learning proper spelling and grammar.

    There are MANY correlating factors between poverty and certain minority groups that correlate A LOT more heavily with poverty or nation of origin then they do with skin color ( Colour ) if you learned to spell in the UK for from a British English teacher.

    Objectively it is sp

    • Assuming your claims are correct, the repressed individual can alleviate their poor education by going back to school; there are many government supported routes to achieve this goal.

    • by AHuxley ( 892839 )
      Re make a lot of spelling mistakes because English
      Is that person entering further "English" education where the level of "English" is going to have to be better and better every year?
      Thats a fault with the education system?

      Re "learning proper spelling and grammar" that's going to be needed soon for the "English" questions, education, essays.

      Re "poverty or nation" That nation can set up their own education system, tests and can pass/fail any % of the population.

      Should US education results take i
  • Despite being old news that these algorithms somehow like the word "egregious", and perhaps well known enough, it somehow remains a favorite despite all the warning signs.

    Perhaps it's yet another example to egregious worship over anything theoretical, without practical testing first. Even the simplest trick, using an algorithm designed to fool the algorithm, would have picked up on the egregious mistake, and demonstrate consistent gibberish that gets a perfect grade.

    Then again, it would probably penalize fo

  • by rsilvergun ( 571051 ) on Tuesday August 20, 2019 @04:48PM (#59107066)
    take over in education. Most standardized tests are run by private companies. Letting a computer grade everything saves a mountain of cash in salaries.
  • Ah it's the stupid old Google Algorithmic Unfairness thing yet again. Unless the AI system gets input about the demographic identity group(s) of the essay authors, this claim physically cannot be true. It cannot be biased against something it does not know.

    • by kubajz ( 964091 )
      That would be true if we could assume that different demographics do not use identifiable features in essays - such as names, idioms, minor mistakes, examples, genders. Is it unthinkable that there might be a correlation between, say, using the phrase "intelligent design" and essay grade, no matter the meaning of the actual text?
      • by fche ( 36607 )

        Then the AI system is "biased" against identifiable features which a member of ANY demographic may choose to produce.

        • by jwdb ( 526327 )

          Then the AI system is "biased" against identifiable features which a member of ANY demographic may choose to produce.

          Sure, if they wish. But, if the algorithm is biased against a set of identifiable features that only a certain demographic regularly chooses to produce, and those features in and of themselves are neutral as far as their effect on the essay but the presence of those features causes lower grades to be assigned, then the AI is in practice biased against that demographic group.

          Or, in other words

          • by fche ( 36607 )

            > identifiable features only a certain demographic regularly chooses to produce

            Whoa, what do you have in mind? Oxford commas? Jive?

            If you can't identify a high-correlation signal by hand, then it does not matter.

          • by AHuxley ( 892839 )
            The person used an essay they paid for on the net and it got found?
            Was that due to certain demographic in the words? That the essay had been detected before?
            The words, slang, jargon did not have anything to do with the set essay question on the day?
    • it doesn't have to know the demographics to have a bias. if the learning dataset the AI uses is biased, then the AI will be biased. here's an overly-simplified example. let's say the learning dataset is all from kids from New York. And "good" essays by kids from New York (based on human evaluation) happen to often have the words "foo" and "bar" in them. So the AI thinks essays with the words "foo" and "bar" in them are "good". Now let's say "good" essays by kids from New Jersey (based on human evaluat
      • by fche ( 36607 )

        You can only make this argument AHEAD OF TIME. You don't get to pick your data, train the AI, peek at the output of the trained AI, and then say, golly jeepers, I don't like the political implication of the results, so let's do-over. At that point, YOU are introducing an overt, explicit bias.

        I mean you can, but that's politics, not science. If you want to do science, you pick your protocol ahead of time, and let it lead you where it may.

        • I don't know if I agree that you can't identify a bias after the fact. But anyway, it's certainly better to try as hard as possible to have an unbiased dataset for learning to begin with. But it's probably also really hard because you don't know what you don't know.
      • by AHuxley ( 892839 )
        Was "foo" and "bar" part of the jargon, the technical terms used all year?
        Then the smart and correct "city" students get to pass and go further in education.

        Did "goo" and "jar" have nothing to do with the topic? Not a part of the terms used all year? Then less students from the "state" will pass.
        They did not do the work needed and did not study/use terms like "foo" and "bar".

        The "city" students on average could all use the words as expected and could show their working, thinking, use of the terms.
  • In a transitional period...

    The first essay that students should submit should be carefully written by hand for a human teacher to review and score.

    And the second essay that students should submit would be automatically generated by a collective/shared "A.I." system for the automated essay scoring systems to review and score.

    The marks assign to students would be based on the first essay. Meanwhile the collective essay generating A.I. would keep improving until it gets a perfect score from the automated scor

    • Remember that Amazon makes billions and does not pay corporate tax.

      So we should make Jeff Bezos grade all the papers? That seems fair.

  • by argStyopa ( 232550 ) on Tuesday August 20, 2019 @05:10PM (#59107116) Journal

    "these tools are susceptible to a flaw that has repeatedly sprung up in the AI world: bias against certain demographic groups"

    Really? What I think you mean to say is that it shows bias against certain methods and forms of expression that aren't generally accepted as standard English, regardless of the ethnicity of the person using them. That is not a "demographic group", is it?

    Because the idea that an "AI" (let's also remember that these are NO SUCH THING) cannot discern the skin color of the person writing an essay without what, magic?

    • It could show bias against forms of expression that are technically correct English but which are more commonly used in some subcultures than are others. Depending on how the training is done, if students from the subculture on average score poorly on writing tests, those poor scores could be incorrectly correlated with a particular style of writing.

      Its the same problem as an AI hiring system that through statistics has found that programmers at a company are mostly white males, so it ranks white male appl

      • "Its the same problem as an AI hiring system that through statistics has found that programmers at a company are mostly white males, so it ranks white male applicants higher."

        So the error here is that the AI isn't "woke" in understanding somehow that vaginas and melanin are somehow intrinsically valuable for coding?

        How very soulless of it to simply do things like look at actual performance, etc. Clearly, it needs to be "fixed".

        Personally, I'd LOVE to see actual performance metrics from departments staffed

        • Nothing to do with "woke". Its about AI, not really being intelligent but just looking for patterns - eg correlations. Then the basic problem with correlation not implying causation. In 1950 there was a strong correlation between being an astrophysicist and being male. Today that is much less true. But an AI trained on 1950s data might easily conclude that women were less likely to be astrophysics, and therefore rank being female as a negative for hires.

          It depends in detail on exactly how the AI is tr

    • by AHuxley ( 892839 )
      People who did not and would not study?
      Who total lack the skills needed for more education....
  • I've said for decades that you can create a technical presentation that misses a key component preventing people from duplicating your results and nobody will notice. They often include phrases like "We implemented the Navier-Stokes equations..." or "We employ a 20-state Extended Kalman Filter...". This is the fancy-pants textbook version of "The solution is left as an exercise to the reader."

    • I've said for decades that you can create a technical presentation that misses a key component preventing people from duplicating your results and nobody will notice.

      That's because the venue is the incorrect one for imparting sufficient detail to allow others to duplicate what you are doing. A technical presentation is intended to report your findings or results, not every step necessary to duplicate them. Nobody will notice that they can't duplicate what you've just presented because they didn't come to the talk to find out how to duplicate what you've done.

      If you want to duplicate someone's results, you talk to him after the presentation and work it out.

      • Even the paper they are presenting is missing key steps. I'm sure that's by design because they're hoping to get paid a lot of money to tell you the recipe for the secret sauce.

  • And as a Motherboard experiment demonstrated, some of the systems can be fooled by nonsense essays with sophisticated vocabulary.

    That's not a flaw in the algorithm.

    The people who write such essays most likely have promising futures in management consulting, and they definitely should be admitted so that they can work towards their MBAs.

  • Who has the program to make a perfect essay based on those patterns?
  • People who did not study?
    People who cant study?
    People who lack the skills to study?
    People who got non academic considerations until they actually had to show their own work? They could not write on the topic and got found out?
    Work under a persons name that was related to other peoples work? Do your own work?
    Buying work and finding out a lot of other people had used the same work. Getting shared work got detected?
    Using words and terms that had nothing to do with the topic?
    Using slang, words, terms
  • Okay, I get that grading essays is time intensive, and thus expensive, but if you're going to make millions of kids write them, you should be willing to have someone on the other side who knows how to grade an essay and and companies https://topwritingcompanies.co... [topwritingcompanies.com] that know how to write a good essay.

Without life, Biology itself would be impossible.

Working...