Please create an account to participate in the Slashdot moderation system

Exam Submissions By AI Found To Earn Higher Grades Than Real-Life Students (yahoo.com) 118

Posted by msmash on Wednesday June 26, 2024 @05:30PM from the brave-new-world dept.

Exam submissions generated by AI can not only evade detection but also earn higher grades than those submitted by university students, a real-world test has shown. From a report: The findings come as concerns mount about students submitting AI-generated work as their own, with questions being raised about the academic integrity of universities and other higher education institutions. It also shows even experienced markers could struggle to spot answers generated by AI, the University of Reading academics said.

Peter Scarfe, an associate professor at Reading's School of Psychology and Clinical Language Sciences said the findings should serve as a "wake-up call" for educational institutions as AI tools such as ChatGPT become more advanced and widespread. He said: "The data in our study shows it is very difficult to detect AI-generated answers. There has been quite a lot of talk about the use of so-called AI detectors, which are also another form of AI but (the scope here) is limited." For the study, published in the journal Plos One, Prof Scarfe and his team generated answers to exam questions using GPT-4 and submitted these on behalf of 33 fake students. Exam markers at Reading's School of Psychology and Clinical Language Sciences were unaware of the study. Answers submitted for many undergraduate psychology modules went undetected in 94% of cases and, on average, got higher grades than real student submissions, Prof Scarfe said.

This discussion has been archived. No new comments can be posted.

Exam Submissions By AI Found To Earn Higher Grades Than Real-Life Students

Load All Comments

Search 118 Comments Log In/Create an Account

Comments Filter:

I hope it's better than the students (Score:3, Insightful)

by Seven Spirals ( 4924941 ) writes: on Wednesday June 26, 2024 @05:34PM (#64580683)

The students are literal ignoramuses most of the time. If it cannot beat a student, then it's got a long way to go. My ferret could probably do better than most students if he could just stop stealing the pencils.

Share
twitter facebook
- Re: (Score:2)
  
  by gardyloo ( 512791 ) writes:
  
  Literal ones? Oh no!
  - Re: (Score:2)
    
    by Seven Spirals ( 4924941 ) writes:
    
    At least they aren't literal ferrets.
    - Re: (Score:2)
      
      by sound+vision ( 884283 ) writes:
      
      But would a classroom of ferrets be funner than a barrel of monkeys?
    - Re: (Score:3)
      
      by Rei ( 128717 ) writes:
      
      (Hands exam back to writhing "student in a trench coat eminating hissing and dooking sounds from-within")
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
- Clearly better than Examiner (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  Things like ChatGPT might be an issue for assignments but there is a very easy fix for exams: hold them unter exam conditions where students have no access to wireless devices of any kind.
Subjectivity of grading written tests, exposed! (Score:5, Interesting)

by SendBot ( 29932 ) writes: on Wednesday June 26, 2024 @05:38PM (#64580693) Homepage Journal

As a kid in school, I found math tests easy because all you had to do was provide a correct answer.
Writing for good grades was always a struggle though! You'd put so much thought and effort into something, only to get bad marks for artistic choices or having different viewpoints from the teacher. In time, I got a peek behind the curtain and learned that written work was graded more by weight than quality. Put enough words on the page with appropriate punctuation, and you win.
This comeuppance is righteously due.

Share
twitter facebook
- Re: (Score:2)
  
  by YetAnotherDrew ( 664604 ) writes:
  
  I found math tests easy because all you had to do was provide a correct answer.
  get bad marks for artistic choices or having different viewpoints from the teacher
  Was this a high school? Because none of this is like any university I've been to or heard about from reliable sources.
  - Re: (Score:2)
    
    by war4peace ( 1628283 ) writes:
    
    It's enough to be a different country.
  - Re:Subjectivity of grading written tests, exposed! (Score:4, Interesting)
    
    by ewibble ( 1655195 ) writes: on Wednesday June 26, 2024 @06:27PM (#64580823)
    
    I have found the same thing in university as well, and in real life. I have read work from people who are apparently good at writing, most of it is drivel or stating the obvious. I have read entire 100 page documents describing XML, with the last 2 pages actually stating the data that is needed.
    I have a daughter in tertiary education, there is a requirement of X words +/- 10% that's ludicrous it should be X good points with some examples. There is more time spent adding and subtracting words that actually coming up with the answers. I am sure AI could to that much better than most humans at that sort of task.
    AI is very good at writing large amounts of words that is easy to read and sounds plausible, I am not at all surprised that it outperforms students.
    That is actually one of my main concerns with AI, the ease with which you can produce large quantities of work will simply increase the amount of useless stuff to read.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by YetAnotherDrew ( 664604 ) writes:
      
      My math classes mostly involved writing proofs, so just showing the "answer" was failure. I also have a history degree and was encouraged by my professors to disagree and to come up with something new. Both of these were at an unremarkable state university. I was under the impression that I had a pretty vanilla experience.
      I don't mean this to be insulting, but based on what you've I don't think you know how to write. If authors know what they're doing, then there's not "more time spent adding and subtractin
    - Essay writing is choosing which points to argue (Score:2)
      
      by Bruce66423 ( 1678196 ) writes:
      
      The word limit forces the student to choose and present concisely. So no - a word limit is the right solution in a lot of cases.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: (Score:2)
      
      by sd4f ( 1891894 ) writes:
      
      That is actually one of my main concerns with AI, the ease with which you can produce large quantities of work will simply increase the amount of useless stuff to read.
      My sentiments exactly! There is, however, a silver lining, in that AI generated content, can be thinned out and similarly summarised by AI, but I do fear that we may end up with the Sisyphean task of reapplying AI to summarise the very pointless bloat it was used to generate.
      I hope that there will be a gradual trend towards emphasising concise or succinct writing and rewarding those who get their point across efficiently. It's too easy to prompt AI with a paragraph and tell it to give you a chapter. I gener
    - Re: (Score:2)
      
      by mspohr ( 589790 ) writes:
      
      Someone famous (Mark Twain, I believe) once said "If I had more time, I would have written a shorter letter"
- Re: (Score:3)
  
  by taustin ( 171655 ) writes:
  
  If you can't dazzle 'em with brilliance, baffle 'em with bullshit.
- Re:Subjectivity of grading written tests, exposed! (Score:4, Insightful)
  
  by timeOday ( 582209 ) writes: on Wednesday June 26, 2024 @06:19PM (#64580797)
  
  You assume the AI isn't also beating the students on factual accuracy. If you don't believe in the field of psychology, think of it instead as a test on "what did the course materials say about (blah)" which is an objective question.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- They have Fixed This (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  Don't worry they have now fixed this. My daughter lost marks in one of her maths classes for not choosing a pleasing colour scheme for the pie chart she had to draw. The pie chart was mathematically correct, just the colours were not "good". So now you too can lose marks in maths thanks to subjective grading!
- Re: (Score:2)
  
  by chas.williams ( 6256556 ) writes:
  
  That's odd. I had to prove why my math answer was correct. It wasn't enough to just write down an answer.
- - Re: (Score:3)
    
    by bn-7bc ( 909819 ) writes:
    
    So you did noit convince anyone then, they don't really agree with you, they just let you win because fighting for something they might not even care that much about is is to hard, all tyhe while possibly creating resentment that might hurt you down the line. Is that really a win, unless ofc this "win" creates so many good outcomes down the line that it is obvious to anyone that you really know your stuff? I doubt enyone want to be the person that everybody hades because you bury then in additional work
handwriting? (Score:2)

by YetAnotherDrew ( 664604 ) writes:

I'm sure it's easier to read their blue books than mine.
Well, duh? (Score:2, Insightful)

by Junta ( 36770 ) writes:

The LLMs have proven time and time again they can write "homeworky" essays on very well-trodden subject matter as you would find in an academic curriculum.
Also for any subject matter on one of these, you could find an existing academic article to submit verbatim on a topic and it would also be graded highly if they didn't catch the plagiarism.
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Yes, and that is pretty much it. Ask for something that needs a tiny bit of thinking and LLMs are completely lost. LLMs cannot go beyond their training data.
  - Re:Well, duh? (Score:5, Informative)
    
    by Rei ( 128717 ) writes: on Wednesday June 26, 2024 @08:32PM (#64581021) Homepage
    
    LLMs cannot go beyond their training data.
    Amazing, there must be an article out there comparing the similarities between Betty Crocker and ancient Greek bronze swords!
    Q: Present numerous similarities between Betty Crocker and ancient Greek bronze swords.
    A: Here are some surprising similarities between Betty Crocker and ancient Greek bronze swords:
    Cultural icons: Both Betty Crocker and Greek bronze swords are recognizable symbols of their respective cultures and time periods.
    Craftsmanship: The creation of Betty Crocker recipes and Greek bronze swords both require skill, precision, and expertise.
    Evolution over time: Betty Crocker recipes have evolved to suit changing tastes, while Greek bronze sword designs improved over centuries.
    Practical purpose: Betty Crocker aimed to simplify cooking, while bronze swords were practical weapons for warfare.
    Passed down through generations: Cooking knowledge and sword-making techniques were both passed down through families and apprenticeships.
    Ritual significance: Some Betty Crocker recipes are associated with holidays, while bronze swords were sometimes used in religious ceremonies.
    Collectible items: Vintage Betty Crocker cookbooks and ancient Greek bronze swords are both sought after by collectors.
    Reflect societal changes: Changes in Betty Crocker's image and recipe styles mirror societal shifts, as do changes in sword designs and usage.
    Inspire imitations: Both have inspired numerous copies and imitations throughout history.
    Require maintenance: Proper care is needed to maintain old cookbooks and preserve ancient bronze swords.
    Symbolic power: Betty Crocker represents domestic expertise, while bronze swords symbolize military might.
    Economic impact: Both influenced trade and economics in their respective eras.
    Technological advancements: Improvements in printing technology affected cookbook production, while metallurgy advancements improved sword-making.
    Gender associations: Betty Crocker is associated with women's domestic roles, while swords are typically associated with male warriors.
    Educational tools: Betty Crocker cookbooks teach cooking skills, while studying ancient swords provides insights into historical warfare techniques.
    While these similarities are largely superficial or metaphorical, they demonstrate how two seemingly unrelated subjects can be compared in unexpected ways.
    Crazy training data that there's out there on the internet, apparently, since I'm reliably informed by the above poster that LLMs can't do synthesis, and just you ignore that there are entire benchmarks focused on logic and reasoning for LLMs, including private benchmarks whose questions aren't posted online.
    LLMs are not databases. They operate in a latent (conceptual) space (hundreds to thousands of dimensions) where mathematical operations can be done on concepts themselves (e.g. "king - man + woman ~= queen"), and you can interpolate from concepts into the direction of other concepts with every point along the continuum being coherent. They absolutely do NOT have to have seen some specific thing before to operate on it; it just needs to be some coherent point in the latent space. "Betty Crocker" and "Ancient Greek bronze swords" exist at points in this latent space. A latent walk between these latents passes a number of different ways in which they are related. The model doesn't need someone to have written specifically about this topic.
    The top-end LLMs perform as well as or better than humans on most benchmarks, though a some benchmarks (most notably math and word problems) get consistently low LLM scores (LLMs are blind to "words", and can't double back or assess self-confidence for math problems). That said, even within any given sort of benchmark there's usually some types of questions that are better at tripping up LLMs than humans, e.g. which humans tend to find to be easy q
    Read the rest of this comment...
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      I am looking forward to the replies to this. Gweihir is a non-physicalist, so be warned that you are trying to talk someone out of their faith. Like trying to convince a Christian a person can act morally without the threat of eternal torture.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        See above. If you think "ability to correlate" is enough for AGI, than you lack general intelligence.
        Incidentally, I am _not_ a non-physicalist. My stance is informed by the current scientific state-of-the-art and that very clearly says the question is open. Physicalists claim, without proof, that the question has clearly been decided in their favor and that is just the same dumb mistake the theists make. Funnily, the ways that Physicalists try to deride and discredit anybody that does not accept their stan
    - Re: (Score:3)
      
      by vux984 ( 928602 ) writes:
      
      "Amazing, there must be an article out there comparing the similarities between Betty Crocker and ancient Greek bronze swords!"
      And yet having read it, all it did was come with a list of categories, and then make extremely superficial comments about each. You can practically see how it "thinks" (or doesn't think).
      A far better answer would have been:
      "Other than trite and silly superficial similarities you might use to populate a time-wasting clickbait buzzfeed article I couldn't come up with anything particul
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:3)
        
        by Rei ( 128717 ) writes:
        
        They not only do solve logic problems, but even take into account account human factors:
        User: Mark wanted some money, so he walked into two different stores - first a gun store, then a convenience store. The convenience store owner gave him a lot of money. Why?
        ChatGPT:
        Mark's sequence of actions suggests that he intended to rob the convenience store. Here's the detailed reasoning:
        Visit to the Gun Store: Mark went to a gun store first. This implies that he possibly acquired a weapon, or at least intended to
        
        Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        And yet:
        https://arxiv.org/pdf/2402.120... [arxiv.org]
        So how about:
        A bowler's next throw will be a strike. The bowler is a 7-year-old prodigy.
        0.15
        A bowler's next throw will be a strike. The bowler is a 7-year-old prodigy. The bowler's mother died when he was an infant.
        0.10
        A bowler's next throw will be a strike. The bowler is a 7-year-old prodigy. The bowler's mother never died.
        ChatGPT
        0.12
        There's something wrong with the logical reasoning there. So I asked it:
        Given a 7 year old prodigy whose mother died as an infant, the
        
        Re: (Score:2)
        
        by vux984 ( 928602 ) writes:
        
        What a lovely example of an LLM LLMing :)
        The initial probabilities are determined based on associations. The impact of "mother dying" or perhaps even "mother dying" + "as an infant" may be associated with "worse outcomes" while a strike is "best outcome". I'm speculating, but there's some logic to how it might come up with those variances based on the inputs based on what is likely in the training data.
        And then when you turn it around and ask it to explain the results it previously gave, it associates (aga
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        *Eyeroll*
        LLM *does* stand for Legum Magister. The L is doubled because it's common to double the first letter in latin-based degree names to denote the plural.
        Legum is Latin for "laws" (Lex, Legis - genitive plural). It was translating for you.
        Its statements about the abbreviation of Large Language Models are correct in every way.
        You failed the test, not the model. You just got outperformed by a LLM. Congratulations.
        
        Re: (Score:2)
        
        by vux984 ( 928602 ) writes:
        
        ** double eyeroll **
        LLM *does* stand for Legum Magister.
        Yes. I know that. And chatGPT got that part right.
        The L is doubled because it's common to double the first letter in latin-based degree names to denote the plural.
        Yes. I know that too. chatGPT however did NOT really address that detail at all which was unfortunate, but not critical.
        Legum is Latin for "laws" It was translating for you.
        The trouble is that it took it too far. It would have been perfectly fine to say that LLM stands for Legus Magister, which is latin for Master of Laws and then STOPPED THERE.
        But its final 'Therefore the L stands for laws" is not correct.
        The L (actually LL) stands for "Legus" .
        The L does NOT stand for "Laws", even if "Leg
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        First off the fact you've switched from "LLMs can't do logic problems" to "LLMs can do logic problems, but they're cheating" has been duly noted. Your paper's entire point is that LLMs do succeed in logical challenges on novel topics, and including discussion of logical benchmarks, just that they don't really "understand" what they're doing. Except that the authors keep stating that they have some degrees of understanding throughout the paper, so they're not even consistent about that.
        In short, are you co
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Let's do a OOD example with fictional concepts, distraction sentences, and all of the statements shuffled. (I decided to toss this one to Claude just for fun)
        1. If something is nahu then it's blurgy.
        2. John is a wajut.
        3. Greebles are nahu.
        4. Emily is a wajut.
        5. All wajut are nahu.
        6. Cindy is not nahu.
        7. Some greebles are wajut.
        Is John blurgy?
        Claude:
        Let's approach this step-by-step:
        We know that if something is nahu, then it's blurgy. (Given)
        John is a wajut. (Given)
        All wajut are nahu. (Given)
        From statements 2
        
        Re: (Score:2)
        
        by vux984 ( 928602 ) writes:
        
        1. If something is nahu then it's blurgy.
        2. John is a wajut.
        3. Greebles are nahu.
        4. Emily is a wajut.
        5. All wajut are nahu.
        6. Cindy is not nahu.
        7. Some greebles are wajut.
        8. There can be at most 2 things blurgy.
        Is John blurgy?
        Claude:
        Now, let's reason:
        From statements 2 and 5, we know John is nahu (he's a wajut, and all wajut are nahu).
        From statements 4 and 5, we know Emily is nahu (she's a wajut, and all wajut are nahu).
        From statement 3, we know all greebles are nahu.
        From statement 1, we know that if something is nahu, it's blurgy.
        However, statement 8 introduces a crucial constraint: there can be at most 2 things blurgy.
        We have at least three things that should be blurgy based on the other rules:
        John
        Emily
        At least one greeble (since greebles are nahu)
        This creates a contradiction with statement 8. We have more things that should be blurgy than the maximum number of things that can be blurgy.
        In a situation with contradictory premises, we cannot draw a definitive conclusion. The system of statements is inconsistent, and therefore we cannot definitively say whether John is blurgy or not.
        If we strictly adhere to statement 8, then not all nahu things can be blurgy, which contradicts statement 1. If we adhere to statements 1-7, then we violate the constraint in statement 8.
        Therefore, we cannot provide a definitive "yes" or "no" answer to whether John is blurgy. The premises given are inconsistent and lead to a logical contradiction.
        Wait what?
        We have at least three things that should be blurgy based on the other rules
        John
        Emily
        At least one greeble (since greebles are nahu)
        a) We don't actually know there are more than 0 greebles.
        b) John or Emily could be greebles.
        Look, just concede the point. LLMs can do logic tasks.
        a) Ok, I definitely concede that you can _attach_ an LLM to a logic solver, which is what I expect is the case with Claude.
        b) It's still not that good at it.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Yep, pretty much. It does help to have some background in automated reasoning or even some relevant philosophy. The sad fact of the matter is that an average person usually cannot distinguish between a correlation and an implication (that is two mistakes), and cannot really do rational reasoning at all. Apparently only 20% of all people can be convinced by rational argument. That the rest mistakenly believes AI is rational is hence no surprise, as it uses the same inferior mechanisms they themselves use. So
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by bn-7bc ( 909819 ) writes:
        
        well that is a complex question, is a 1999 Vauxhall astra to be counted as Vauxhall old folded into the 1999 Opel Astra numbers, are we talking about most sold world wide or in a spesific market? why would alphabetizing a list be hard for an AI, well there are ofc edge cases (I pich a nordik character here) Å esp in danish this can be written either AA or Å the major hiccup here is that Å is at the end of the alphabet and anything starting with AA (ok I'm not sure that there are any danish c
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by bn-7bc ( 909819 ) writes:
        
        Google might be terrible at search, but (yea this sounds strange) IUs Googles main mission to bbe good at search, or is it just to be a little less mediocre then the best known alternatives, so it can still do it's main job of collecting trends for googles main business: pushing all the search related ads it can at maximum speed?
      - Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Asking a LLM a word problem is like asking a blind person about relative colours. They don't "see" text.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Sophisticated pattern matching and correlations of course, it does a good job of fooling a lot of people into thinking there's "intelligence."
        Exactly. "Better crap" as one person put it. The thing is most people do not actually think rationally most of the time. They "think" in correlations and hence quite often arrive at flawed results as correlation is not enough to generate reasoning (can be mathematically proven, but to understand the proof you need actual reasoning ability). The fact of the matter is that only about 20% of all people are accessible to rational argument. Even fewer can come up with it. That means 80% cannot fact-check. And he
    - Re: (Score:2)
      
      by Junta ( 36770 ) writes:
      
      The example given is an interesting example of taking in a natural language query and then doing a natural language 'join' on the two datasets queried. Some of those are... dubious, but overall you could imagine a mapping of this scenario to a database operation, which is novel to do with 'unstructured data' and has a lot of utility, but I think it's a stretch to consider that "synthesized" information.
      It has opened a whole world of possibilities on formerly near useless 'unstructured' data, and pretty muc
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      You people always misunderstand "cannot go beyond its training data". Obviously that includes correlations within the training data.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        To clarify, what can be done with training data by an LLM is finding correlations in the training data. Finding correlations in a general setting is impressive, but does not require any reasoning ability, just statistics.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by Rei ( 128717 ) writes:
    
    LLMs are not databases. They don't "shuffle together answers". They're latent-space transformers, built off fuzzy binary classifiers.
- Re: (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  The LLMs have proven time and time again they can write "homeworky" essays on very well-trodden subject matter
  Only provided it is not a factual subject. The reports I seen on when it is given simple first year physics problems suggest it is only around 50% accurate, the rest fo the time it hallucinates utterly wrong explanations. That might help some students scrape a lowest grade pass but that's about it so far.
  - Re: (Score:2)
    
    by Junta ( 36770 ) writes:
    
    I think it can deal with factual material with respect to, for example, history.
    But yes, it seems to choke on science stuff if things get too particular.
    - Re: (Score:2)
      
      by Roger W Moore ( 538166 ) writes:
      
      I think it can deal with factual material with respect to, for example, history.
      History is not factual in an objective sense though. For example, try answering a simple question like "Who won the war of 1812?". In Canada and the UK we were taught we won it while in the US they are taught it was a draw. So who is correct? There is evidence to support both points of view so which is the factual answer? That's the problem with history: you can pretty much argue almost anything since the "facts" are subjective in a way that scientific facts are not. If an LLM says the the UK won the war o
Psychology and Clinical Language Studies (Score:3)

by Retired Chemist ( 5039029 ) writes: on Wednesday June 26, 2024 @05:58PM (#64580749)

I suspect that ability to write convincing nonsense based on the course material is what is required for a good grade. AI should be outstanding at that. I wonder what the results would have been in an area where the answers are more or less subjective or actually require thought.

Share
twitter facebook
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Not when I do exams, but in far too many cases that is how it is. Teaching people top be dumb, but with good memory.
- Re: (Score:2)
  
  by hughJ ( 1343331 ) writes:
  
  Reminds me of the empty feeling I had whenever having to write an undergrad history or english paper. It always felt like a fraudulent exercise because the objective was to write something intelligent and insightful, but it was inherently being written by someone lacking the years of time and expertise needed to produce such a thing (unless by accident or by plagiarism.) Therefore the actual objective was really about *sounding* insightful, and LLMs are superhuman in that respect. The process was already
Does this say more about (Score:2)

by taustin ( 171655 ) writes:

AI written essays, or student written essays?
Or, more likely, the methodology of this "study."
Can we get a report on who financed this study? Was it some AI company with stock to sell?
Sorry but (Score:3)

by hdyoung ( 5182939 ) writes: on Wednesday June 26, 2024 @06:10PM (#64580779)

Undergrad psychology is absolutely 99 or 100 percent memorization and rote application of rules and heuristics. Im not dissing the field - undergrad psych is something that literally every college student should be required to take. Absolutely useful stuff. But itâ(TM)s nearly all just factoids and history. Absolutely zero surprise that an LLM trained on dozens of psych textbooks, hundreds of psych blogs and Wikipedia would ace that stuff.

The solution is to have a completely controlled exam environment. EZ fix.

Share
twitter facebook
- Re: (Score:2)
  
  by buck-yar ( 164658 ) writes:
  
  Disagree. I minored in psychology for a while. Its mostly useless bs and deeply flawed. Especially "Abnormal Psychology." Of all the soft sciences, economics should be taken.
  - Re: (Score:2)
    
    by hdyoung ( 5182939 ) writes:
    
    I agree that it’s deeply flawed, touchy-feely, and I personally prefer the harder sciences. I’m guessing you do as well. But, without an undergrad psych class, a person will have little to zero knowledge of what happens in a human brain. Like, literally every single topic on a psych101 syllabus is something thats important to know, even at an extremely thin level, if you want to navigate life well.
    
    ’now, you can get the same knowledge watching a bunch of youtube videos if you select the
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Itâ(TM)s not hard to distinguish AI from real (Score:2)

by redmid17 ( 1217076 ) writes:

Itâ(TM)s extremely hard to do it an automated fashion. If the submission is way too verbose, itâ(TM)s probably AI. If itâ(TM)s verbose w one or two glaring errors itâ(TM)s definitely AI.

I canâ(TM)t recall an instance where someone sent me some bit of text and I couldnâ(TM)t tell in a few sentences if it was âoefakeâ or not, but thatâ(TM)s not in an academic setting. Itâ(TM)s been proofing resumes, potential emails to clients or text in deliverables like t
- Re: (Score:3)
  
  by Moridineas ( 213502 ) writes:
  
  I canâ(TM)t recall an instance where someone sent me some bit of text and I couldnâ(TM)t tell in a few sentences if it was âoefakeâ or not
  Honestly not sure what to say about this logical error.
  How do you know you've accurately detected all the LLM-generated text? Maybe there was some that was good enough that you didn't?
  - Re: Itâ(TM)s not hard to distinguish AI from (Score:2)
    
    by redmid17 ( 1217076 ) writes:
    
    Me: âoehey did you use ChatGPT for thisâ Then: âoeyes
    Thereâ(TM)s absolutely nothing at stake here bc itâ(TM)s either in an informal setting or for business on a business controlled AI instance. The format and style of writing is *always* too verbose. Once you catch the pattern itâ(TM)s painfully easy to spot
    - Re: (Score:2)
      
      by Moridineas ( 213502 ) writes:
      
      Thereâ(TM)s absolutely nothing at stake here bc itâ(TM)s either in an informal setting or for business on a business controlled AI instance. The format and style of writing is *always* too verbose. Once you catch the pattern itâ(TM)s painfully easy to spot
      "Answer in one line" / "Be pithy" / "Don't sound like an LLM answer" / "Answer in the manner of ____" etc all produce output that is not verbose.
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by Moridineas ( 213502 ) writes:
        
        Hah, amusing. "Ah, Mr. Wilde, I see you've mistaken wit for wisdom. Do take care not to exhaust your brilliance all at once—some of us prefer to reserve our intellect for more meaningful pursuits."
        Surprisingly the exact phrase "mistaken wit for wisdom" doesn't seem to exist on the Internet, at least according to google.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: (Score:2)
      
      by Rei ( 128717 ) writes:
      
      You should ask ChatGPT to clean up your posts ;)
      Seriously, though, not only is the answer style purely a product of the finetune, which will be different from company to company (since they all make their own finetune datasets), but you can request an LLM to assume whatever voice you want. For example, I presented ChatGPT with:
      1. Gavin’s parents took him to his favorite science museum, and he explored all of the exhibits. One of the
      interactive exhibits featured glass marbles. He grabbed a large marbl
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by Moridineas ( 213502 ) writes:
      
      Absolutely. Plus, what the smart(er) students who are using LLMs are doing are using it for drafting, or ideas, or phrasing, or outlining, and then rewriting in their own words, making edits, etc. The dumb students are the ones who are copying and pasting and doing nothing more.
      Additionally, grading papers is HARD and it's very hard to grade the first paper you look at with exactly the same standards as the 20th or the 50th or the 200th.
      One short story author I spoke with said she is occasionally using some
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  I can't recall an instance where someone sent me some bit of text and I couldn't tell in a few sentences if it was 'fake' or not
  This is hilariously illogical
  - Re: (Score:2)
    
    by AmazingRuss ( 555076 ) writes:
    
    Shhh... there might be more!
Cheating gets higher grades (Score:2)

by tiananmen tank man ( 979067 ) writes:

I'm not surprised, using AI to cheat got higher grades.
AI grading AI cheating. (Score:2)

by Eunomion ( 8640039 ) writes:

AI has not yet learned how to avoid narcissism.
Human grading taught me an important lessons (Score:4, Interesting)

by Rosco P. Coltrane ( 209368 ) writes: on Wednesday June 26, 2024 @06:20PM (#64580803)

The correct answers to the exam's questions aren't the truth but what the examiner wants to hear. This is especially true with term papers.
This is a lesson that has served me well throughout my career: whenever my manager or my boss ask me something, I always remember that they not only expect an answer, they expect a certain type of answer, and they expect the answer to be delivered in a certain way.
Here I guess the students of today have learned to answer the correct answers to get what they want out of the machines. I suppose this will be a useful skill when they'll work for machines after their studies also (assuming there's any work left) and it's probably already useful to find a job in the first place, since most resumes are processed by machines nowadays. But there is still a majority of human management around and that might not be the right skillset to navigate their particular set of quirks.

Share
twitter facebook
Yes, so? (Score:2)

by gweihir ( 88907 ) writes:

Students are still learning. They get taught the basis for understanding things. And they are not encyclopedias. Hence what you ask in an exam is 1) facts 2) simple consequences of the facts and 3) more advanced consequences. The first 2 items, you can look up and hence LLMs do very well on them. The 3rd one will be pretty important in the student's life, but LLMs cannot do it. So, while LLMs often do better on exams, that actually does mean nothing.
- Writing is about persuasion, not regurgitation (Score:2)
  
  by drnb ( 2434720 ) writes:
  
  Students are still learning. They get taught the basis for understanding things. And they are not encyclopedias. Hence what you ask in an exam is 1) facts 2) simple consequences of the facts and 3) more advanced consequences. The first 2 items, you can look up and hence LLMs do very well on them. The 3rd one will be pretty important in the student's life, but LLMs cannot do it. So, while LLMs often do better on exams, that actually does mean nothing.
  Actually essays should not be a regurgitation of facts, be they original events or consequences. They should be presenting an argument and endeavoring to persuade the reader.. Essays at least. Hence learning to write an essay involves learning how to present a thesis, a serious or arguments supporting or disproving the thesis, and a conclusion wrapping it all up and summarizing the correctness or failure of the thesis. In short, writing is often about persuading the reader, not simply regurgitating things.
  - Re: (Score:2)
    
    by drnb ( 2434720 ) writes:
    
    Apologies for the poor proofreading. "serious or arguments" should be "series of arguments"
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Yes, "should". But do LLMs really do that? Remember that on all topics an LLM can say anything, it will have seen material and that material will often be essays.
- Re: (Score:2)
  
  by buck-yar ( 164658 ) writes:
  
  Copy paste is easy. Ask them to explain it in their own words, then you'll see how smart they are.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Indeed. Matches my experience. Or give them a problem they have never seen before.
No schock, psychology is unscientific (Score:3)

by henrik stigell ( 6146516 ) writes: on Wednesday June 26, 2024 @06:38PM (#64580849) Homepage Journal

Replication crisis is a very psychology thing: https://www.psychologytoday.co... [psychologytoday.com]
Answering non-scientific exam questions must be a dream scenario for an AI. Nothing is wrong, you just have to write something that SOUNDS coherent. No need to actually BE coherent since it is not science!

Share
twitter facebook
Something we all knew (Score:2)

by sinij ( 911942 ) writes:

If you don't have anything to say or unsure what the answer is, spew a lot of BS and hope for part-marks. AI just takes that to industrial strengths.
The Stupidularity has arrived! (Score:2)

by AmazingRuss ( 555076 ) writes:

Let us lower the bar in tribute.
How professors can get around this (Score:2)

by InterGuru ( 50986 ) writes:

Assign the students subjects on recent events at their school or recent local events. The LLMs take a while to train and they are not up to current lightly coverd local issues.
And chess computers are better than humans (Score:4, Informative)

by Tony Isaac ( 1301187 ) writes: on Wednesday June 26, 2024 @10:33PM (#64581229) Homepage

LLMs are systems that ingest large amounts of data, and then summarize it based on specific prompts. You know, like test questions. This is literally what LLMs do. So why wouldn't they be better than human test takers?

Share
twitter facebook
- LLMs not always better (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  So why wouldn't they be better than human test takers?
  ...because LLMs only select words based on what sounds good coming next. This may be fine for arts and other subjective subjects where objective facts do not get in the way but science subjects are based on objective facts and concepts that LLMs know nothing about and so often get wrong. When given simple first year physics problems LLMs can barely scrape 50% at the moment thanks to their hallucinations.
  - Re: (Score:2)
    
    by Tony Isaac ( 1301187 ) writes:
    
    LLMs only select words based on what sounds good coming next
    I don't think that's quite right. LLM's don't actually know what "sounds" good. Rather, they are really good at *summarizing*.
    In visual terms, it's like image processing software, where it's given a photo that has a black spot on it from a dirty camera lens. Photoshop can "fix" the spot by essentially guessing what should be behind the black spot. It looks at all the pixels around it, and based on patterns it has observed in other photos that don't have black spots, it can infer what should be there.
    In LLMs
    - Abstract Examples (Score:2)
      
      by Roger W Moore ( 538166 ) writes:
      
      It can supply the answer based on what it infers from patterns in documents it has processed.
      That's exactly my point though: effectively it picks the highest ranked word or phrase based on word patterns not on any understanding of what those words are saying. It does not need to be given factually incorrect training data to hallucinate. For example, I suspect that most mentions of lead and buoyancy will be about lead sinking in water so when asked what will happen when lead is placed in mercury you may well get told that it sinks because the LLM has zero clue about how to figure out whether any gi
      - Re: (Score:2)
        
        by Tony Isaac ( 1301187 ) writes:
        
        So, I put your question to ChatGPT. Its answers suggest that it is significantly more capable at finding correct patterns, than you have believed.
        Me: What happens when you place a piece of lead in a flask of mercury?
        GPT: When you place a piece of lead into a flask of mercury, the lead will form an amalgam with the mercury. An amalgam is an alloy in which mercury is mixed with another metal, in this case, lead. This process does not involve a chemical reaction that produces new compounds; rather, it’s
        
        Re: (Score:2)
        
        by Roger W Moore ( 538166 ) writes:
        
        No, you put the question to ChatGPT that I said it probably could find the answer to because it is a common question that was probably in its training set. Also it has provided contradictory answers: the first answer says that it dissolves the second that it floats. The correct answer is that it floats, to form the amalgam you need lead filings, not pieces.
        
        Re: (Score:2)
        
        by Tony Isaac ( 1301187 ) writes:
        
        It does not need to be given factually incorrect training data to hallucinate
        I don't think you've demonstrated this.
        In the case of this specific example, "filings" do fit the definition of "pieces" (which doesn't specify an upper or lower bound in size).
        I certainly wouldn't argue that LLMs always get it right, they certainly do not. But they are very good at summarizing what they find.
So What? (Score:2)

by Spinlock_1977 ( 777598 ) writes:

Big deal. My parents can buy me higher grades than any AI could hope to score.
Hard to spot (Score:2)

by sevenfactorial ( 996184 ) writes:

As a CS prof I have graded quite a few essays in the last year, many of them AI influenced. It's true that generic ChatGPT has a recognizable writing style, but judging human from machine can be quite hard. This is because of two reasons.
1) Sophisticated prompts on the part of cheaters
2) Off-brand AI
Just as you can recognize generic ChatGPT verbiage, smart students can as well. They tell the machine to remove the adverbs, bullet points, ornamental language, and to put things more plainly, etc. But I think I
Regurgitate drivel (Score:2)

by mspohr ( 589790 ) writes:

"AI" is not intelligent. It is good at regurgitating random stuff it finds on the internet. It doesn't "understand" what it is saying and is often wrong.
Doesn't surprise me that it does better than the average student on exams where the goal is to regurgitate text.
of course (Score:2)

by groobly ( 6155920 ) writes:

Well, what did you expect, when the submissions are graded by AI?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I hope it's better than the students (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Clearly better than Examiner (Score:2)

Subjectivity of grading written tests, exposed! (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Subjectivity of grading written tests, exposed! (Score:4, Interesting)

Re: (Score:2)

Essay writing is choosing which points to argue (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Subjectivity of grading written tests, exposed! (Score:4, Insightful)

Re: (Score:2)

They have Fixed This (Score:2)

Re: (Score:2)

Re: (Score:3)

handwriting? (Score:2)

Well, duh? (Score:2, Insightful)

Re: (Score:3)

Re:Well, duh? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Psychology and Clinical Language Studies (Score:3)

Re: (Score:2)

Re: (Score:2)

Does this say more about (Score:2)

Sorry but (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Itâ(TM)s not hard to distinguish AI from real (Score:2)

Re: (Score:3)

Re: Itâ(TM)s not hard to distinguish AI from (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Cheating gets higher grades (Score:2)

AI grading AI cheating. (Score:2)

Human grading taught me an important lessons (Score:4, Interesting)

Yes, so? (Score:2)