Exam Submissions By AI Found To Earn Higher Grades Than Real-Life Students (yahoo.com) 118
Exam submissions generated by AI can not only evade detection but also earn higher grades than those submitted by university students, a real-world test has shown. From a report: The findings come as concerns mount about students submitting AI-generated work as their own, with questions being raised about the academic integrity of universities and other higher education institutions. It also shows even experienced markers could struggle to spot answers generated by AI, the University of Reading academics said.
Peter Scarfe, an associate professor at Reading's School of Psychology and Clinical Language Sciences said the findings should serve as a "wake-up call" for educational institutions as AI tools such as ChatGPT become more advanced and widespread. He said: "The data in our study shows it is very difficult to detect AI-generated answers. There has been quite a lot of talk about the use of so-called AI detectors, which are also another form of AI but (the scope here) is limited." For the study, published in the journal Plos One, Prof Scarfe and his team generated answers to exam questions using GPT-4 and submitted these on behalf of 33 fake students. Exam markers at Reading's School of Psychology and Clinical Language Sciences were unaware of the study. Answers submitted for many undergraduate psychology modules went undetected in 94% of cases and, on average, got higher grades than real student submissions, Prof Scarfe said.
Peter Scarfe, an associate professor at Reading's School of Psychology and Clinical Language Sciences said the findings should serve as a "wake-up call" for educational institutions as AI tools such as ChatGPT become more advanced and widespread. He said: "The data in our study shows it is very difficult to detect AI-generated answers. There has been quite a lot of talk about the use of so-called AI detectors, which are also another form of AI but (the scope here) is limited." For the study, published in the journal Plos One, Prof Scarfe and his team generated answers to exam questions using GPT-4 and submitted these on behalf of 33 fake students. Exam markers at Reading's School of Psychology and Clinical Language Sciences were unaware of the study. Answers submitted for many undergraduate psychology modules went undetected in 94% of cases and, on average, got higher grades than real student submissions, Prof Scarfe said.
I hope it's better than the students (Score:3, Insightful)
Re: (Score:2)
Literal ones? Oh no!
Re: (Score:2)
Re: (Score:2)
But would a classroom of ferrets be funner than a barrel of monkeys?
Re: (Score:3)
(Hands exam back to writhing "student in a trench coat eminating hissing and dooking sounds from-within")
Re: (Score:2)
Clearly better than Examiner (Score:2)
Subjectivity of grading written tests, exposed! (Score:5, Interesting)
As a kid in school, I found math tests easy because all you had to do was provide a correct answer.
Writing for good grades was always a struggle though! You'd put so much thought and effort into something, only to get bad marks for artistic choices or having different viewpoints from the teacher. In time, I got a peek behind the curtain and learned that written work was graded more by weight than quality. Put enough words on the page with appropriate punctuation, and you win.
This comeuppance is righteously due.
Re: (Score:2)
I found math tests easy because all you had to do was provide a correct answer.
get bad marks for artistic choices or having different viewpoints from the teacher
Was this a high school? Because none of this is like any university I've been to or heard about from reliable sources.
Re: (Score:2)
It's enough to be a different country.
Re:Subjectivity of grading written tests, exposed! (Score:4, Interesting)
I have found the same thing in university as well, and in real life. I have read work from people who are apparently good at writing, most of it is drivel or stating the obvious. I have read entire 100 page documents describing XML, with the last 2 pages actually stating the data that is needed.
I have a daughter in tertiary education, there is a requirement of X words +/- 10% that's ludicrous it should be X good points with some examples. There is more time spent adding and subtracting words that actually coming up with the answers. I am sure AI could to that much better than most humans at that sort of task.
AI is very good at writing large amounts of words that is easy to read and sounds plausible, I am not at all surprised that it outperforms students.
That is actually one of my main concerns with AI, the ease with which you can produce large quantities of work will simply increase the amount of useless stuff to read.
Re: (Score:2)
My math classes mostly involved writing proofs, so just showing the "answer" was failure. I also have a history degree and was encouraged by my professors to disagree and to come up with something new. Both of these were at an unremarkable state university. I was under the impression that I had a pretty vanilla experience.
I don't mean this to be insulting, but based on what you've I don't think you know how to write. If authors know what they're doing, then there's not "more time spent adding and subtractin
Essay writing is choosing which points to argue (Score:2)
The word limit forces the student to choose and present concisely. So no - a word limit is the right solution in a lot of cases.
Re: (Score:2)
High school for me felt like number of words. I once had a two page report on the causes of the civil war. The teacher wanted it to be 6 pages or more. I asked what I had left off, and he said I hadn't skipped anything but he just wanted it to be longer. Got a decent grade anyway, probably grudgingly. Meanwhile, a friend who double spaced everything, used repetition, etc, got a good grade as well.
I found on SATs that the written parts of the test were hard for me. Same with all the standardized tests. Wi
Re: (Score:2)
Re: (Score:2)
Oh ya, going to McDonald's in Helsinki they absolutely all spoke English better than they do in the McDonald's in America (and before the Brits start laughing, they may have the same problem).
Re: (Score:2)
Re: (Score:2)
That is actually one of my main concerns with AI, the ease with which you can produce large quantities of work will simply increase the amount of useless stuff to read.
My sentiments exactly! There is, however, a silver lining, in that AI generated content, can be thinned out and similarly summarised by AI, but I do fear that we may end up with the Sisyphean task of reapplying AI to summarise the very pointless bloat it was used to generate.
I hope that there will be a gradual trend towards emphasising concise or succinct writing and rewarding those who get their point across efficiently. It's too easy to prompt AI with a paragraph and tell it to give you a chapter. I gener
Re: (Score:2)
Someone famous (Mark Twain, I believe) once said "If I had more time, I would have written a shorter letter"
Re: (Score:3)
If you can't dazzle 'em with brilliance, baffle 'em with bullshit.
Re:Subjectivity of grading written tests, exposed! (Score:4, Insightful)
Re: (Score:2)
They have Fixed This (Score:2)
Re: (Score:2)
Re: (Score:3)
handwriting? (Score:2)
Well, duh? (Score:2, Insightful)
The LLMs have proven time and time again they can write "homeworky" essays on very well-trodden subject matter as you would find in an academic curriculum.
Also for any subject matter on one of these, you could find an existing academic article to submit verbatim on a topic and it would also be graded highly if they didn't catch the plagiarism.
Re: (Score:3)
Yes, and that is pretty much it. Ask for something that needs a tiny bit of thinking and LLMs are completely lost. LLMs cannot go beyond their training data.
Re:Well, duh? (Score:5, Informative)
Amazing, there must be an article out there comparing the similarities between Betty Crocker and ancient Greek bronze swords!
Crazy training data that there's out there on the internet, apparently, since I'm reliably informed by the above poster that LLMs can't do synthesis, and just you ignore that there are entire benchmarks focused on logic and reasoning for LLMs, including private benchmarks whose questions aren't posted online.
LLMs are not databases. They operate in a latent (conceptual) space (hundreds to thousands of dimensions) where mathematical operations can be done on concepts themselves (e.g. "king - man + woman ~= queen"), and you can interpolate from concepts into the direction of other concepts with every point along the continuum being coherent. They absolutely do NOT have to have seen some specific thing before to operate on it; it just needs to be some coherent point in the latent space. "Betty Crocker" and "Ancient Greek bronze swords" exist at points in this latent space. A latent walk between these latents passes a number of different ways in which they are related. The model doesn't need someone to have written specifically about this topic.
The top-end LLMs perform as well as or better than humans on most benchmarks, though a some benchmarks (most notably math and word problems) get consistently low LLM scores (LLMs are blind to "words", and can't double back or assess self-confidence for math problems). That said, even within any given sort of benchmark there's usually some types of questions that are better at tripping up LLMs than humans, e.g. which humans tend to find to be easy q
Re: (Score:2)
I am looking forward to the replies to this. Gweihir is a non-physicalist, so be warned that you are trying to talk someone out of their faith. Like trying to convince a Christian a person can act morally without the threat of eternal torture.
Re: (Score:2)
See above. If you think "ability to correlate" is enough for AGI, than you lack general intelligence.
Incidentally, I am _not_ a non-physicalist. My stance is informed by the current scientific state-of-the-art and that very clearly says the question is open. Physicalists claim, without proof, that the question has clearly been decided in their favor and that is just the same dumb mistake the theists make. Funnily, the ways that Physicalists try to deride and discredit anybody that does not accept their stan
Re: (Score:3)
"Amazing, there must be an article out there comparing the similarities between Betty Crocker and ancient Greek bronze swords!"
And yet having read it, all it did was come with a list of categories, and then make extremely superficial comments about each. You can practically see how it "thinks" (or doesn't think).
A far better answer would have been:
"Other than trite and silly superficial similarities you might use to populate a time-wasting clickbait buzzfeed article I couldn't come up with anything particul
Re: (Score:2)
True, LLMs are in particular very bad at "reasoning". They don't connect the dots; to them if A -> B, and B -> C, then it knows nothing at all about how A and C are related. Of course, you can train in some basic rules. But LLMs aren't designed to follow lists of instructions.
Ie, if LLM based AI can do arithmetic, then it is because it is fitted with a method to do this, recognizing that an arithmetic question was asked and then feed that into an algorithm or calculator attached to the AI. But you
Re: (Score:3)
They not only do solve logic problems, but even take into account account human factors:
Re: (Score:2)
And yet:
https://arxiv.org/pdf/2402.120... [arxiv.org]
So how about:
There's something wrong with the logical reasoning there. So I asked it:
Re: (Score:2)
What a lovely example of an LLM LLMing :)
The initial probabilities are determined based on associations. The impact of "mother dying" or perhaps even "mother dying" + "as an infant" may be associated with "worse outcomes" while a strike is "best outcome". I'm speculating, but there's some logic to how it might come up with those variances based on the inputs based on what is likely in the training data.
And then when you turn it around and ask it to explain the results it previously gave, it associates (aga
Re: (Score:2)
*Eyeroll*
LLM *does* stand for Legum Magister. The L is doubled because it's common to double the first letter in latin-based degree names to denote the plural.
Legum is Latin for "laws" (Lex, Legis - genitive plural). It was translating for you.
Its statements about the abbreviation of Large Language Models are correct in every way.
You failed the test, not the model. You just got outperformed by a LLM. Congratulations.
Re: (Score:2)
** double eyeroll **
LLM *does* stand for Legum Magister.
Yes. I know that. And chatGPT got that part right.
The L is doubled because it's common to double the first letter in latin-based degree names to denote the plural.
Yes. I know that too. chatGPT however did NOT really address that detail at all which was unfortunate, but not critical.
Legum is Latin for "laws" It was translating for you.
The trouble is that it took it too far. It would have been perfectly fine to say that LLM stands for Legus Magister, which is latin for Master of Laws and then STOPPED THERE.
But its final 'Therefore the L stands for laws" is not correct. .
The L (actually LL) stands for "Legus"
The L does NOT stand for "Laws", even if "Leg
Re: (Score:2)
First off the fact you've switched from "LLMs can't do logic problems" to "LLMs can do logic problems, but they're cheating" has been duly noted. Your paper's entire point is that LLMs do succeed in logical challenges on novel topics, and including discussion of logical benchmarks, just that they don't really "understand" what they're doing. Except that the authors keep stating that they have some degrees of understanding throughout the paper, so they're not even consistent about that.
In short, are you co
Re: (Score:2)
Let's do a OOD example with fictional concepts, distraction sentences, and all of the statements shuffled. (I decided to toss this one to Claude just for fun)
1. If something is nahu then it's blurgy.
2. John is a wajut.
3. Greebles are nahu.
4. Emily is a wajut.
5. All wajut are nahu.
6. Cindy is not nahu.
7. Some greebles are wajut.
Is John blurgy?
Claude:
Let's approach this step-by-step:
We know that if something is nahu, then it's blurgy. (Given)
John is a wajut. (Given)
All wajut are nahu. (Given)
From statements 2
Re: (Score:2)
1. If something is nahu then it's blurgy.
2. John is a wajut.
3. Greebles are nahu.
4. Emily is a wajut.
5. All wajut are nahu.
6. Cindy is not nahu.
7. Some greebles are wajut.
8. There can be at most 2 things blurgy.
Is John blurgy?
Claude:
Now, let's reason:
From statements 2 and 5, we know John is nahu (he's a wajut, and all wajut are nahu).
From statements 4 and 5, we know Emily is nahu (she's a wajut, and all wajut are nahu).
From statement 3, we know all greebles are nahu.
From statement 1, we know that if something is nahu, it's blurgy.
However, statement 8 introduces a crucial constraint: there can be at most 2 things blurgy.
We have at least three things that should be blurgy based on the other rules:
John
Emily
At least one greeble (since greebles are nahu)
This creates a contradiction with statement 8. We have more things that should be blurgy than the maximum number of things that can be blurgy.
In a situation with contradictory premises, we cannot draw a definitive conclusion. The system of statements is inconsistent, and therefore we cannot definitively say whether John is blurgy or not.
If we strictly adhere to statement 8, then not all nahu things can be blurgy, which contradicts statement 1. If we adhere to statements 1-7, then we violate the constraint in statement 8.
Therefore, we cannot provide a definitive "yes" or "no" answer to whether John is blurgy. The premises given are inconsistent and lead to a logical contradiction.
Wait what?
We have at least three things that should be blurgy based on the other rules
John
Emily
At least one greeble (since greebles are nahu)
a) We don't actually know there are more than 0 greebles.
b) John or Emily could be greebles.
Look, just concede the point. LLMs can do logic tasks.
a) Ok, I definitely concede that you can _attach_ an LLM to a logic solver, which is what I expect is the case with Claude.
b) It's still not that good at it.
Re: (Score:2)
Yep, pretty much. It does help to have some background in automated reasoning or even some relevant philosophy. The sad fact of the matter is that an average person usually cannot distinguish between a correlation and an implication (that is two mistakes), and cannot really do rational reasoning at all. Apparently only 20% of all people can be convinced by rational argument. That the rest mistakenly believes AI is rational is hence no surprise, as it uses the same inferior mechanisms they themselves use. So
Re: (Score:2)
And yet, say "List the top ten best selling automobile models of all time, sorted alphabetically", it will have trouble doing so. Especially the alphabetizing part. Or at least it used to give bad answers that got more and more wrong the more you tried to specify precisely what was wanted. Of course, problem is easily solved by now training the AI on what alphabetizing means. But you can always come up with a new type of question that exposes the LLM based AI for being based upon pattern matching and co
Re: (Score:2)
Re: (Score:2)
Yes, because you're thinking about it. The first time asked, ChatGPT just liked 10 popular models, with no sorting whatsoever. Later it gave a different list of ten, one of them using "Honda" as a model name. It just didn't understand what was being asked. Of course, this was first gen ChatGPT, and it does better now.
The early ChatGPT was amazing, so are the current ones. But what is amazing is NOT that "chat" part, the chat part is dumb as nails. The actual advance in LLM is in how it processes natura
Re: (Score:2)
Re: (Score:2)
Asking a LLM a word problem is like asking a blind person about relative colours. They don't "see" text.
Re: (Score:2)
Sophisticated pattern matching and correlations of course, it does a good job of fooling a lot of people into thinking there's "intelligence."
Exactly. "Better crap" as one person put it. The thing is most people do not actually think rationally most of the time. They "think" in correlations and hence quite often arrive at flawed results as correlation is not enough to generate reasoning (can be mathematically proven, but to understand the proof you need actual reasoning ability). The fact of the matter is that only about 20% of all people are accessible to rational argument. Even fewer can come up with it. That means 80% cannot fact-check. And he
Re: (Score:2)
The example given is an interesting example of taking in a natural language query and then doing a natural language 'join' on the two datasets queried. Some of those are... dubious, but overall you could imagine a mapping of this scenario to a database operation, which is novel to do with 'unstructured data' and has a lot of utility, but I think it's a stretch to consider that "synthesized" information.
It has opened a whole world of possibilities on formerly near useless 'unstructured' data, and pretty muc
Re: (Score:2)
You people always misunderstand "cannot go beyond its training data". Obviously that includes correlations within the training data.
Re: (Score:2)
To clarify, what can be done with training data by an LLM is finding correlations in the training data. Finding correlations in a general setting is impressive, but does not require any reasoning ability, just statistics.
Re: (Score:2)
Because they've got a training set that students can't replicate. Train on a set of questions and likely answers. Maybe the answers and questions don't correlate, but generally having a LARGE sample of well written answers means that the AI can just regurgitate the answers, shuffling them all together and mixing it up so that there's not direct match that can be pointed to as "copying" or plagiarism. Exactly the same as the student who copies off of a friend's old term paper but then tries to modify it j
Re: (Score:2)
LLMs are not databases. They don't "shuffle together answers". They're latent-space transformers, built off fuzzy binary classifiers.
Re: (Score:2)
The LLMs have proven time and time again they can write "homeworky" essays on very well-trodden subject matter
Only provided it is not a factual subject. The reports I seen on when it is given simple first year physics problems suggest it is only around 50% accurate, the rest fo the time it hallucinates utterly wrong explanations. That might help some students scrape a lowest grade pass but that's about it so far.
Re: (Score:2)
I think it can deal with factual material with respect to, for example, history.
But yes, it seems to choke on science stuff if things get too particular.
Re: (Score:2)
I think it can deal with factual material with respect to, for example, history.
History is not factual in an objective sense though. For example, try answering a simple question like "Who won the war of 1812?". In Canada and the UK we were taught we won it while in the US they are taught it was a draw. So who is correct? There is evidence to support both points of view so which is the factual answer? That's the problem with history: you can pretty much argue almost anything since the "facts" are subjective in a way that scientific facts are not. If an LLM says the the UK won the war o
Psychology and Clinical Language Studies (Score:3)
Re: (Score:2)
Not when I do exams, but in far too many cases that is how it is. Teaching people top be dumb, but with good memory.
Re: (Score:2)
Does this say more about (Score:2)
AI written essays, or student written essays?
Or, more likely, the methodology of this "study."
Can we get a report on who financed this study? Was it some AI company with stock to sell?
Sorry but (Score:3)
The solution is to have a completely controlled exam environment. EZ fix.
Re: (Score:2)
Re: (Score:2)
’now, you can get the same knowledge watching a bunch of youtube videos if you select the
Re: (Score:2)
My undergrad biology 101 class was like that. All memorization. And I had taken advanced bio in high school and already done all the memorization. However all the pre-med students warned me against taking that class - "it's the hardest class on campus, by far!" So I get an A+ in one of the easier classes I've taken. Of course, the pre-meds looked at me like I was a freak (I probably was but they wouldn't be able to work out a biological reason why).
Itâ(TM)s not hard to distinguish AI from real (Score:2)
I canâ(TM)t recall an instance where someone sent me some bit of text and I couldnâ(TM)t tell in a few sentences if it was âoefakeâ or not, but thatâ(TM)s not in an academic setting. Itâ(TM)s been proofing resumes, potential emails to clients or text in deliverables like t
Re: (Score:3)
I canâ(TM)t recall an instance where someone sent me some bit of text and I couldnâ(TM)t tell in a few sentences if it was âoefakeâ or not
Honestly not sure what to say about this logical error.
How do you know you've accurately detected all the LLM-generated text? Maybe there was some that was good enough that you didn't?
Re: Itâ(TM)s not hard to distinguish AI from (Score:2)
Thereâ(TM)s absolutely nothing at stake here bc itâ(TM)s either in an informal setting or for business on a business controlled AI instance. The format and style of writing is *always* too verbose. Once you catch the pattern itâ(TM)s painfully easy to spot
Re: (Score:2)
Thereâ(TM)s absolutely nothing at stake here bc itâ(TM)s either in an informal setting or for business on a business controlled AI instance. The format and style of writing is *always* too verbose. Once you catch the pattern itâ(TM)s painfully easy to spot
"Answer in one line" / "Be pithy" / "Don't sound like an LLM answer" / "Answer in the manner of ____" etc all produce output that is not verbose.
Re: (Score:2)
"You've just been insulted by Oscar Wilde. Respond with a withering retort of your own that will deflect his wit and make him seem the fool instead. You have 60 seconds."
Re: (Score:2)
Hah, amusing. "Ah, Mr. Wilde, I see you've mistaken wit for wisdom. Do take care not to exhaust your brilliance all at once—some of us prefer to reserve our intellect for more meaningful pursuits."
Surprisingly the exact phrase "mistaken wit for wisdom" doesn't seem to exist on the Internet, at least according to google.
Re: (Score:2)
Surprisingly the exact phrase "mistaken wit for wisdom" doesn't seem to exist on the Internet, at least according to google.
Possibly because Google lacks both.
Re: (Score:2)
You should ask ChatGPT to clean up your posts ;)
Seriously, though, not only is the answer style purely a product of the finetune, which will be different from company to company (since they all make their own finetune datasets), but you can request an LLM to assume whatever voice you want. For example, I presented ChatGPT with:
Re: (Score:2)
Part of the problem is you as the underpaid grader for the course have 50 papers to read, in between your actual research. Thus the necessary attention spent to each paper is not very good. It may start off with earnestness, but near the end it gets rushed. Thus, easy for an LLM written paper to slip through.
Re: (Score:2)
Absolutely. Plus, what the smart(er) students who are using LLMs are doing are using it for drafting, or ideas, or phrasing, or outlining, and then rewriting in their own words, making edits, etc. The dumb students are the ones who are copying and pasting and doing nothing more.
Additionally, grading papers is HARD and it's very hard to grade the first paper you look at with exactly the same standards as the 20th or the 50th or the 200th.
One short story author I spoke with said she is occasionally using some
Re: (Score:2)
This is hilariously illogical
Re: (Score:2)
Cheating gets higher grades (Score:2)
I'm not surprised, using AI to cheat got higher grades.
AI grading AI cheating. (Score:2)
Human grading taught me an important lessons (Score:4, Interesting)
The correct answers to the exam's questions aren't the truth but what the examiner wants to hear. This is especially true with term papers.
This is a lesson that has served me well throughout my career: whenever my manager or my boss ask me something, I always remember that they not only expect an answer, they expect a certain type of answer, and they expect the answer to be delivered in a certain way.
Here I guess the students of today have learned to answer the correct answers to get what they want out of the machines. I suppose this will be a useful skill when they'll work for machines after their studies also (assuming there's any work left) and it's probably already useful to find a job in the first place, since most resumes are processed by machines nowadays. But there is still a majority of human management around and that might not be the right skillset to navigate their particular set of quirks.
Yes, so? (Score:2)
Students are still learning. They get taught the basis for understanding things. And they are not encyclopedias. Hence what you ask in an exam is 1) facts 2) simple consequences of the facts and 3) more advanced consequences. The first 2 items, you can look up and hence LLMs do very well on them. The 3rd one will be pretty important in the student's life, but LLMs cannot do it. So, while LLMs often do better on exams, that actually does mean nothing.
Writing is about persuasion, not regurgitation (Score:2)
Students are still learning. They get taught the basis for understanding things. And they are not encyclopedias. Hence what you ask in an exam is 1) facts 2) simple consequences of the facts and 3) more advanced consequences. The first 2 items, you can look up and hence LLMs do very well on them. The 3rd one will be pretty important in the student's life, but LLMs cannot do it. So, while LLMs often do better on exams, that actually does mean nothing.
Actually essays should not be a regurgitation of facts, be they original events or consequences. They should be presenting an argument and endeavoring to persuade the reader.. Essays at least. Hence learning to write an essay involves learning how to present a thesis, a serious or arguments supporting or disproving the thesis, and a conclusion wrapping it all up and summarizing the correctness or failure of the thesis. In short, writing is often about persuading the reader, not simply regurgitating things.
Re: (Score:2)
Re: (Score:2)
Yes, "should". But do LLMs really do that? Remember that on all topics an LLM can say anything, it will have seen material and that material will often be essays.
Re: (Score:2)
Re: (Score:2)
Indeed. Matches my experience. Or give them a problem they have never seen before.
No schock, psychology is unscientific (Score:3)
Replication crisis is a very psychology thing: https://www.psychologytoday.co... [psychologytoday.com]
Answering non-scientific exam questions must be a dream scenario for an AI. Nothing is wrong, you just have to write something that SOUNDS coherent. No need to actually BE coherent since it is not science!
Something we all knew (Score:2)
The Stupidularity has arrived! (Score:2)
How professors can get around this (Score:2)
And chess computers are better than humans (Score:4, Informative)
LLMs are systems that ingest large amounts of data, and then summarize it based on specific prompts. You know, like test questions. This is literally what LLMs do. So why wouldn't they be better than human test takers?
LLMs not always better (Score:2)
So why wouldn't they be better than human test takers?
Re: (Score:2)
LLMs only select words based on what sounds good coming next
I don't think that's quite right. LLM's don't actually know what "sounds" good. Rather, they are really good at *summarizing*.
In visual terms, it's like image processing software, where it's given a photo that has a black spot on it from a dirty camera lens. Photoshop can "fix" the spot by essentially guessing what should be behind the black spot. It looks at all the pixels around it, and based on patterns it has observed in other photos that don't have black spots, it can infer what should be there.
In LLMs
Abstract Examples (Score:2)
It can supply the answer based on what it infers from patterns in documents it has processed.
That's exactly my point though: effectively it picks the highest ranked word or phrase based on word patterns not on any understanding of what those words are saying. It does not need to be given factually incorrect training data to hallucinate. For example, I suspect that most mentions of lead and buoyancy will be about lead sinking in water so when asked what will happen when lead is placed in mercury you may well get told that it sinks because the LLM has zero clue about how to figure out whether any gi
Re: (Score:2)
So, I put your question to ChatGPT. Its answers suggest that it is significantly more capable at finding correct patterns, than you have believed.
Me: What happens when you place a piece of lead in a flask of mercury?
GPT: When you place a piece of lead into a flask of mercury, the lead will form an amalgam with the mercury. An amalgam is an alloy in which mercury is mixed with another metal, in this case, lead. This process does not involve a chemical reaction that produces new compounds; rather, it’s
Re: (Score:2)
Re: (Score:2)
It does not need to be given factually incorrect training data to hallucinate
I don't think you've demonstrated this.
In the case of this specific example, "filings" do fit the definition of "pieces" (which doesn't specify an upper or lower bound in size).
I certainly wouldn't argue that LLMs always get it right, they certainly do not. But they are very good at summarizing what they find.
So What? (Score:2)
Big deal. My parents can buy me higher grades than any AI could hope to score.
Hard to spot (Score:2)
As a CS prof I have graded quite a few essays in the last year, many of them AI influenced. It's true that generic ChatGPT has a recognizable writing style, but judging human from machine can be quite hard. This is because of two reasons.
1) Sophisticated prompts on the part of cheaters
2) Off-brand AI
Just as you can recognize generic ChatGPT verbiage, smart students can as well. They tell the machine to remove the adverbs, bullet points, ornamental language, and to put things more plainly, etc. But I think I
Regurgitate drivel (Score:2)
"AI" is not intelligent. It is good at regurgitating random stuff it finds on the internet. It doesn't "understand" what it is saying and is often wrong.
Doesn't surprise me that it does better than the average student on exams where the goal is to regurgitate text.
of course (Score:2)
Well, what did you expect, when the submissions are graded by AI?