Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI Education

Texas Will Use Computers To Grade Written Answers On This Year's STAAR Tests 41

Keaton Peters reports via the Texas Tribune: Students sitting for their STAAR exams this week will be part of a new method of evaluating Texas schools: Their written answers on the state's standardized tests will be graded automatically by computers. The Texas Education Agency is rolling out an "automated scoring engine" for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies. The technology, which uses natural language processing technology like artificial intelligence chatbots such as GPT-4, will save the state agency about $15-20 million per year that it would otherwise have spent on hiring human scorers through a third-party contractor.

The change comes after the STAAR test, which measures students' understanding of state-mandated core curriculum, was redesigned in 2023. The test now includes fewer multiple choice questions and more open-ended questions -- known as constructed response items. After the redesign, there are six to seven times more constructed response items. "We wanted to keep as many constructed open ended responses as we can, but they take an incredible amount of time to score," said Jose Rios, director of student assessment at the Texas Education Agency. In 2023, Rios said TEA hired about 6,000 temporary scorers, but this year, it will need under 2,000.

To develop the scoring system, the TEA gathered 3,000 responses that went through two rounds of human scoring. From this field sample, the automated scoring engine learns the characteristics of responses, and it is programmed to assign the same scores a human would have given. This spring, as students complete their tests, the computer will first grade all the constructed responses. Then, a quarter of the responses will be rescored by humans. When the computer has "low confidence" in the score it assigned, those responses will be automatically reassigned to a human. The same thing will happen when the computer encounters a type of response that its programming does not recognize, such as one using lots of slang or words in a language other than English.
"In addition to 'low confidence' scores and responses that do not fit in the computer's programming, a random sample of responses will also be automatically handed off to humans to check the computer's work," notes Peters. While similar to ChatGPT, TEA officials have resisted the suggestion that the scoring engine is artificial intelligence. They note that the process doesn't "learn" from the responses and always defers to its original programming set up by the state.
This discussion has been archived. No new comments can be posted.

Texas Will Use Computers To Grade Written Answers On This Year's STAAR Tests

Comments Filter:
  • LOL (Score:4, Funny)

    by backslashdot ( 95548 ) on Tuesday April 09, 2024 @07:25PM (#64382228)

    Be prepared for some ridiculous essays to be graded A+. Reference: https://arstechnica.com/scienc... [arstechnica.com]

  • by Tablizer ( 95088 ) on Tuesday April 09, 2024 @07:29PM (#64382242) Journal

    My understanding is that the STAAR test merely suggests areas a student may need improvement in. If it's a rough guide to see who may need help rather than a direct tool to hold kids back a grade, maybe this is no big deal, especially if it's re-reviewed by a human reader IF a hold-back is initially recommended.

    As soon as the test results have teeth, parents will then wish to boot the bots.

    • by cascadingstylesheet ( 140919 ) on Wednesday April 10, 2024 @05:53AM (#64382956) Journal

      If it's a rough guide to see who may need help rather than a direct tool to hold kids back a grade,

      Does anybody hold kids back a grade anymore? Even when my kids were young it just wasn't done.

      The (possibly correct) theory is that it makes more sense to start providing additional support and individualized instruction, rather than repeating a year of instruction that didn't work the first time around, along with all the personal upheaval and age mismatch that goes with it.

      • by Tablizer ( 95088 )

        In my state they try to use summer school for that purpose. It's a hell of an incentive also: make the grades or have your summer ruined. When all the other kids are frolicking at pools and lakes, they are doing fractions in a smelly mobile classroom.

        • In my state they try to use summer school for that purpose. It's a hell of an incentive also: make the grades or have your summer ruined. When all the other kids are frolicking at pools and lakes, they are doing fractions in a smelly mobile classroom.

          That would be an incentive!

      • by dfm3 ( 830843 )
        Tennessee. [msn.com]
        They vastly expanded their retention policy a couple years ago, basing it almost entirely on standardized testing. One standardized test, to be specific. And despite widespread public backlash, they're seeking to expand the practice.
        • Tennessee. [msn.com] They vastly expanded their retention policy a couple years ago, basing it almost entirely on standardized testing. One standardized test, to be specific. And despite widespread public backlash, they're seeking to expand the practice.

          Oh, wow!

    • Regardless of whether this particular case will cause harm, it is worrisome that so little information about the AI's effectiveness is being provided. If they were confident in the system they would be providing metrics on how often the AI matches the grades of human reviewers. They have millions of tests to verify the AI's competence with. When the AI provides a high confidence grade, does it match the grade of a human reviewer 99% of the time? 90% of the time? Then compare that with how many times the two

    • The STAAR test is mostly a feel good joke. Texas is pathologically averse to any kind of national standardized testing, since they tend to show Texas schools in an off-narrative and not very positive light [ontocollege.com].

      In theory kids who don't take, or flunk the STAAR tests on certain subjects cannot graduate from high school. I don't know how often this is really done, the high schools here seem to be doing everything possible to pass the trouble makers from grade to grade, or where possible, to juvie.

  • by zenlessyank ( 748553 ) on Tuesday April 09, 2024 @07:30PM (#64382248)

    I understand the fact that the current population likes to make money and keep others from making it. Controlling education is part of that. Eventually the smart educated folks are going to die off and no one will be smart enough to take their place.

    Remember when family business didn't mean mafia?

  • You are approached by a frenzied Texan scientist, who yells, "I'm going to put my quantum harmonizer in your photonic resonation chamber!" What's your response?

  • by retchdog ( 1319261 ) on Tuesday April 09, 2024 @08:57PM (#64382350) Journal

    So, uh, we imprison children for their youngest years and teach them to write, all so they can vomit some shit that will be graded by a computer anyway because we're afraid to just sort people by IQ in the first fucking place, and then forget most of it because we don't really have much use for literacy beyond obedience anyway.

    Why not just, like, stop doing it entirely instead of cutting corners?

    • A kid may have a stellar IQ, but that's no guarantee that they will do well at school. Being bored, bullied or generally abused may all cause such a poor outcome.

    • by rta ( 559125 )

      because we're afraid to just sort people by IQ in the first fucking place,

      Pretty much.

      Well, it's more that the political and educational system likes to pretend outwardly that all kids have ~ 115+ IQ and that if only we increase funding some more and find the right curriculum then everyone can grow up to be a doctor or engineer. That's been like the past ~30? years of American educational philosophy. (or maybe educational political marketing and posturing )

  • by thesjaakspoiler ( 4782965 ) on Tuesday April 09, 2024 @09:22PM (#64382382)

    What could go wrong?

  • oh the irony (Score:2, Insightful)

    by sdinfoserv ( 1793266 )
    So, Texas doesn't trust automated voting tally machines but they want these in schools... perfect sense...
    • Re: (Score:1, Informative)

      by Anonymous Coward

      That's because getting their favorite politicians into seats of power is MUCH more important that silly things like education.

      Education is all woke and make believe with things like theories of evolution, archeology (dinosaurs are made up), civil rights, and global warming (climates change all the time. No proof that Texas oil is hurting anybody.)

    • Re: (Score:3, Insightful)

      by Bob_Who ( 926234 )

      Get a degree in education and eliminate jobs in that field at the very same time.

      That's as ridiculous as outlawing abortion while refusing to increase funds to foster or educate unwanted children. Then, prioritizing the orphans' right to own automatic weapons once they age out of Juvie.

      Only Texas messes with Texas..

  • u/Texas (Score:2, Insightful)

    by tgibson ( 131396 )
    State name checks out.
  • by Fly Swatter ( 30498 ) on Wednesday April 10, 2024 @12:53AM (#64382604) Homepage
    But clearly we have reached the point of too many people to teach...

    Stop treating our next generations worse than pets; would you trust your pet to being fed by an AI?
    • 'The city with a population of 155,000 along the Connecticut river has a median household income half the state average; violent crime is common. Yet graduation rates at the city’s high schools are surging. Between 2007 and 2022 the share of pupils at the Springfield High School of Science and Technology who earned a diploma in four years jumped from 50% to 94%; at neighbouring Roger Putnam Vocational Technical Academy it nearly doubled to 96%.

      'Alas, such gains are not showing up in other academic ind

      • That actually comes as little surprise. Here in the UK we've gone from 10% of school leavers going to university to 50%. More than 3% of them are graduating functionally illiterate and unable to follow instructions on a medicine bottle, or innumerate and unable to work out how much fuel is in a tank of gas when the gauge shows 1/4.
        • not suprising, bet he can't either - no info on how big a texas gas tank is in libraries of congress or football fields.
      • Many colleges have banned the SAT. Unsure how they judge worthy to enter now? Maybe if you have credit worthiness to be tied to a college loan for life?

  • by VeryFluffyBunny ( 5037285 ) on Wednesday April 10, 2024 @03:53AM (#64382798)
    This sounds like it's essentially a reverse Turing test (See: https://lesperelman.com/writin... [lesperelman.com]), i.e. Can the computer reliably tell the difference from meaningful human responses vs. machine generated & therefore meaningless responses (but the text code is linguistically/grammatically/lexically accurate). The basic principle is that well-formed language code can be meaningless, e.g. Noam Chomsky's infamous sentence, "Colorless green ideas sleep furiously." In this case, the test is between meaningful (i.e. the messages in the responses, rather than the text code, answer the prompts) vs. meaningless (i.e. the messages in the responses do not answer the prompts, regardless of whether the text code is accurate or not).

    They seem to believe that the grading algorithm will reliable enough & are motivated by the prospect of it saving money.

    There's another way to grade essays & other constructed response items that is also faster & more reliable than typical human grading; adaptive comparative judgement (See: https://en.wikipedia.org/wiki/... [wikipedia.org]). I'd say, if they're really interested in improving grading in Texas, rather than making headlines by jumping on the AI bandwagon, they'd run limited side-by-side pilot trials comparing all 3; typical rubric based grading, adaptive comparative judgement, & letting the LLM grade them unsupervised. Let's see how the comparative reliability & money-saving results turn out.
  • where thinking goes to die.

  • The specifics of the tech involved are one thing, but many states have done this for a while now. The contracts (and the RFPs before them) involved in standardized testing will show examples all over the country.

    Despite training against a significant body of student responses, at least one AI-driven evaluation maybe seven years ago (personal experience here) would give high marks (for example, 4/4 on use of evidence, 4/4 on structure of essay, 3/3 on writing conventions -- usually there are multiple criteri

  • no time at all.

    But this is Texas, where they abuse teachers to play their "Christian" white supremecist narrative, and teachers are leaving the state.

Experiments must be reproducible; they should all fail in the same way.

Working...