AI Secretly Helped Write California Bar Exam, Sparking Uproar (arstechnica.com) 41

Posted by BeauHD on Wednesday April 23, 2025 @04:26PM from the would-you-look-at-that dept.

An anonymous reader quotes a report from Ars Technica: On Monday, the State Bar of California revealed that it used AI to develop a portion of multiple-choice questions on its February 2025 bar exam, causing outrage among law school faculty and test takers. The admission comes after weeks of complaints about technical problems and irregularities during the exam administration, reports the Los Angeles Times. The State Bar disclosed that its psychometrician (a person skilled in administrating psychological tests), ACS Ventures, created 23 of the 171 scored multiple-choice questions with AI assistance. Another 48 questions came from a first-year law student exam, while Kaplan Exam Services developed the remaining 100 questions.

The State Bar defended its practices, telling the LA Times that all questions underwent review by content validation panels and subject matter experts before the exam. "The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam," wrote State Bar Executive Director Leah Wilson in a press release. According to the LA Times, the revelation has drawn strong criticism from several legal education experts. "The debacle that was the February 2025 bar exam is worse than we imagined," said Mary Basick, assistant dean of academic skills at the University of California, Irvine School of Law. "I'm almost speechless. Having the questions drafted by non-lawyers using artificial intelligence is just unbelievable." Katie Moran, an associate professor at the University of San Francisco School of Law who specializes in bar exam preparation, called it "a staggering admission." She pointed out that the same company that drafted AI-generated questions also evaluated and approved them for use on the exam. The report notes that the AI disclosure follows technical glitches with the February exam (like login issues, screen lag, and confusing questions), which led to a federal lawsuit against Meazure Learning and calls for a State Bar audit.

AI Secretly Helped Write California Bar Exam, Sparking Uproar

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 41 Comments Log In/Create an Account

Comments Filter:

First year law student exam (Score:3)

by Retired Chemist ( 5039029 ) writes: on Wednesday April 23, 2025 @04:37PM (#65326387)

I am more interested in the fact that 48 questions came from a first-year law student exam. Wouldn't missing even one of these raise questions about the exam takers knowledge? A certain level of basic knowledge would seem to be implied by merely having a law degree before even taking the exam.

- Re:First year law student exam (Score:5, Insightful)
  
  by Rinnon ( 1474161 ) writes: on Wednesday April 23, 2025 @04:50PM (#65326427)
  
  I am more interested in the fact that 48 questions came from a first-year law student exam. Wouldn't missing even one of these raise questions about the exam takers knowledge?
  
  Eh, that's a standard that is a bit too high: law students and lawyers aren't superhumans. Would you be able to get 48/48 questions right in an exam that you took 3-4 years after the class in which you learned them? If the answer is yes, do you think ALL of your peers could have?
  
  - Re: (Score:2)
    
    by NaCh0 ( 6124 ) writes:
    
    Could I get 48/48 on a first year exam?
    Possibly. But even if it's a 46/48 or 47/48, it's still a high passing score.
    Could my classmates? I went to school with a lot of retards so probably not. And I'm 110% okay with the retards failing the test in any field of study.
- Re: First year law student exam (Score:3)
  
  by anonymouscoward52236 ( 6163996 ) writes:
  
  How do you say "you're all gonna be replaced by AI some day" without saying it directly? Use it to make their tests. ;-)
- Re:First year law student exam (Score:4, Insightful)
  
  by nealric ( 3647765 ) writes: on Thursday April 24, 2025 @09:40AM (#65327551)
  
  I'm a lawyer.
  The first-year curriculum is indeed foundational, but first year courses aren't necessarily more "basic" than second and third year courses. In fact, because it has been two years between your first-year classes and the bar exam, you are more likely to have a poor memory of those early classes. A lot of the bar exam relies on memorization, and a rule from your first-year class does noes not necessarily carry over into later classes.
  For example, the rule against perpetuities is an infamous concept taught in first year property class but not generally covered in upper-level classes. It's an old common law (derived from centuries of old English cases) set of rules that seek to prevent someone from controlling the use of their property indefinitely from beyond the grave. For example, if you put in your will "I donate my house to the local animal shelter and it must be used as a home for pets forever", that would be invalid- you can't mandate what your heirs (and their heirs) do with the house forever. While the policy makes sense, the rules in practice can be fiendishly complex and the vocabulary of trusts and estates is full of obscure terminology and Latin phrases that belie the old roots of this area of law. If you do not practice trusts and estates law, it is highly unlikely you will ever deal with the rule against perpetuities in your legal career. However, because of its complexity, the rule against perpetuities is a favorite of bar examiners.
  On the other hand, criminal law (typically a 2nd or 3rd year class) is actually relatively straightforward from a conceptual standpoint. It's mostly a list of crimes and the elements that make them, plus a few foundational concepts. Because you probably took it more recently, it's more fresh in your head and as such criminal law questions tend not to be feared by bar takers to the same degree as property questions.
  
- Re: (Score:2)
  
  by fropenn ( 1116699 ) writes:
  
  The number of items an examinee has to get correct in order to pass is determined by the difficulty of those items and by an expert panel who carefully reviews the test and determines what is needed to pass. For example, if those 48 items were really difficult, the "bar" could be set at something pretty low, maybe even like 20-30 out of 48 correct (it also depends on whether the test was speeded - limited in time - and if there is a penalty for guessing and getting an item wrong). These standardized exams a
"No, put your hand in the OPEN side of the box." (Score:2)

by Pseudonymous Powers ( 4097097 ) writes:

With the advent of LLMs, the lazily thoughtless among us are outing themselves as never before. None of these people would survive the gom jabbar test.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  please pray for the pope
- Re: "No, put your hand in the OPEN side of the box (Score:3)
  
  by reanjr ( 588767 ) writes:
  
  Nitpick, but the test is referred to as the "human test" or "test for humanity", while the gom jabbar is the poison. The gom jabbar is not necessarily the only way to run the test. You could just as easily point a lasgun at their head.
  - Re: (Score:1)
    
    by Latent Heat ( 558884 ) writes:
    
    The test for humanity is that you gut out and bear the pain of the nerve-induction box, not just because you don't want to die but because you are plotting your revenge against who is subjecting you to this torture.
    Not only did Paul Atreides amaze the Reverend Mother by bearing much more pain than anyone else, the revenge he carried out against the Reverend Mother was way beyond anything she could have even imagined.
    As to gom jabbar vs lasgun, first of all lasgun weren't threatening to people in a cult
    - Re: "No, put your hand in the OPEN side of the bo (Score:2)
      
      by reanjr ( 588767 ) writes:
      
      "lasgun weren't threatening to people in a culture where they had "shields.""
      Shields are visible and it would take longer/more movement to activate a shield generator than it would to simply knock aside the gom jabbar needle.
    - Re: (Score:2)
      
      by Xenx ( 2211586 ) writes:
      
      Secondly, Dune is pretty much the same story as The Godfather with Paul Atreides being the Michael Corelone character.apart from a few changes.
      Most of what you said was wrong/off in some way. This, however, deserves specific clarification. It's generally best practice to compare the later work to the former, not the other way around. Because Dune came first, it would be more accurate wording to say The Godfather is pretty much the same story as Dune.
    - Re: (Score:3)
      
      by sg_oneill ( 159032 ) writes:
      
      As to gom jabbar vs lasgun, first of all lasgun weren't threatening to people in a culture where they had "shields." Secondly, Dune is pretty much the same story as The Godfather with Paul Atreides being the Michael Corelone character.apart from a few changes.
      If they ever make a Godfather IV, they should close the circle of borrowing and have sandworms. Hell, borrow from the later Dune novels, and have Micheal Corelone* fuse himself with the worm beginning a 1000 year reign as the head of Coreleone family.
    - Re: (Score:1)
      
      by drinkypoo ( 153816 ) writes:
      
      As to gom jabbar vs lasgun, first of all lasgun weren't threatening to people in a culture where they had "shields."
      Yeah, nuclear explosions are cool, and not at all harmful.
      - WTAF (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        If you have not read the Dune books, you should not be allowed anywhere near a thread of Dune references, noob.
        If you don't know what happens when you fire a lasgun at a shield in the universe of Dune, you haven't read even the first book.
        
        Dune, shields and lasguns (Score:2)
        
        by Latent Heat ( 558884 ) writes:
        
        Like I don't know that a lasgun fired at a shield sets of a nuclear explosion?
        It's mutually assured destruction, dude, why shields nullify the use of lasguns. A Reverend Mother would be stupid to put a lasgun to someone's head who just might be able to switch on a shield and take them both out.
        Sheesh, I need to explain all of this to be allowed near a thread on Dune references?
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Not you.
        My comment was a reply to my modded down comment. If you are using the trash mobile interface (which is understandable because the other interfaces are even worse on mobile) then you probably don't see the actual comment thread flow.
Tempest in a teapot? (Score:5, Insightful)

by jenningsthecat ( 1525947 ) writes: on Wednesday April 23, 2025 @04:43PM (#65326407)

The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam...
Because of the subsequent review by qualified humans, I don't see this as a problem. But the scenario described in TFS is just begging for a 'slippery slope' argument.
I think it's inevitable that in the future, one AI will be tasked with checking the work of another AI. Then, in a storm of stupidity, the policy will become having the AI check its own homework.
After that, of course, the practice will devolve into just trusting the initial AI output, with no verification step. Welcome to the AI apocalypse!

- Re: (Score:3)
  
  by nightflameauto ( 6607976 ) writes:
  
  The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam...
  Because of the subsequent review by qualified humans, I don't see this as a problem. But the scenario described in TFS is just begging for a 'slippery slope' argument.
  I think it's inevitable that in the future, one AI will be tasked with checking the work of another AI. Then, in a storm of stupidity, the policy will become having the AI check its own homework.
  After that, of course, the practice will devolve into just trusting the initial AI output, with no verification step. Welcome to the AI apocalypse!
  My concern in this particular instance is that the company using the AI was also in charge of reviewing the questions generated by AI. And the end result was "confusing questions" (straight from the summary) which was partially the reason the entire test (along with login issues and screen lag) was called into question. I'm thinking there's bigger systemic issues here than just having AI involved in the process, but having AI involved in the process clearly wasn't helping anything.
  Though I know what you're
  - Re: (Score:2)
    
    by jenningsthecat ( 1525947 ) writes:
    
    And the end result was "confusing questions" (straight from the summary) which was partially the reason the entire test (along with login issues and screen lag) was called into question.
    Thanks for the gentleness of your rebuke. I was in a hurry to get on my high horse and didn't carefully read TFS - my bad.
    I'm a bit surprised that they're already using second, and even third, AIs for anything important. Racing to make a version of "broken telephone" part of any system or process whose results you actually care about is utter madness.
    - Re: (Score:2)
      
      by nightflameauto ( 6607976 ) writes:
      
      And the end result was "confusing questions" (straight from the summary) which was partially the reason the entire test (along with login issues and screen lag) was called into question.
      Thanks for the gentleness of your rebuke. I was in a hurry to get on my high horse and didn't carefully read TFS - my bad.
      I'm a bit surprised that they're already using second, and even third, AIs for anything important. Racing to make a version of "broken telephone" part of any system or process whose results you actually care about is utter madness.
      Surely, if they just filter the results through enough "hallucinating" AI chatbots, the results'll be perfect!
- Re: (Score:2)
  
  by larryjoe ( 135075 ) writes:
  
  After that, of course, the practice will devolve into just trusting the initial AI output, with no verification step. Welcome to the AI apocalypse!
  This implicit trust of AI with no human proofreading and sanity check is not only a problem for AI but for everything in every field. Even if the creator of the test questions or anything else is an expert, having no human proofreading and sanity check is a problem. Imagine a Nobel prize winning author who wrote a book without multiple passes by proofreaders and editors. There's no way errors wouldn't creep through.
  If we ever get to the point where AI can be trusted with the proofreading and sanity check
  - Re: (Score:3)
    
    by jenningsthecat ( 1525947 ) writes:
    
    If we ever get to the point where AI can be trusted with the proofreading and sanity check, that also wouldn't be a problem because it means that AI would have already evolved to a point where lawyers and bar exams are no longer relevant or needed.
    Wouldn't it be cool if humans could evolve to that point? ;-)
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Subsequent review does not address bias in the choice of questions.
  Let's say an AI writes all the questions and every question includes the word "black" somewhere in the phrasing. Give different experts each a question in their domain of competence and ask if the question passes review. Let's say it does, it's not like "black" is an uncommon word. Let's say a few questions fail review, and are replaced... with other questions containing the word "black".
  In the end, all questions that are accepted passed r
- Re: (Score:2)
  
  by Deal In One ( 6459326 ) writes:
  
  It will be AI setting questions, and marking the answers.
  Once it gets to that stage, you pretty much don't need humans to be lawyers.
  And if you know how to hack the system (Kirk and his no win scenario test hack), you will be always be the top scorer.
Would they have counted? (Score:5, Interesting)

by pz ( 113803 ) writes: on Wednesday April 23, 2025 @04:46PM (#65326423) Journal

The real question is if these AI-generated questions were going to be counted in the final score, or if they were merely being evaluated for inclusion in future exams.
A colleague of mine is getting his MD, and the periodic tests they take contain a fair fraction of questions that are being evaluated for inclusion on future exams, but do not count toward the present score. Could the same have been happening here, but that important distinction being lost in the clamorous din?
As with any professional qualification exam, there is a certain level of knowledge that must be demonstrated. As long as the questions being used to demonstrate that knowledge are vetted by experts in the field and validated before being officially used, does it really matter who wrote them or how?

- Re:Would they have counted? (Score:4, Insightful)
  
  by Targon ( 17348 ) writes: on Wednesday April 23, 2025 @06:39PM (#65326619)
  
  AI generated questions aren't a problem if those taking the exams actually understand the question and provide the correct answer. This goes to a basic mistake that too many people make, not understanding the difference between AI being a tool for humans, and AI being used to skip having humans be involved in the work. The people complaining the most haven't asked, "are the questions worded clearly enough for humans to be able to understand what is being asked?" That's all they need to be concerned with. This is why having humans review what AI comes up with is critical, to make sure that the AI isn't using hallucinations as the basis for what it comes up with.
  I wouldn't trust what AI comes up with for ANYTHING, without doing a full review of the results, and then you have to wonder if the effort needed to review the results is more than just doing the work yourself.
  
- Re: (Score:2)
  
  by fropenn ( 1116699 ) writes:
  
  a fair fraction of questions that are being evaluated for inclusion on future exams
  Yes, and many exams also include items that were used on prior versions of the exam (this is to allow the connection, or "linking", of scores between exams, so, for example, a score of 120 on the spring version is equivalent to a score of 120 on the fall version).
  
  If you think that is scandalous, wait until people learn that testing companies are using (non-AI) computers to score written essays (they've done this for decades).
So, in CA (Score:1)

by Anonymous Coward writes:

The Bars are owned by the State?
- Re: So, in CA (Score:2)
  
  by bubblyceiling ( 7940768 ) writes:
  
  Yup, just like MAGA is owned by Russians and the Saudis
- Re: (Score:2)
  
  by Valgrus Thunderaxe ( 8769977 ) writes:
  
  Is this different than any other US state?
- Re: (Score:2)
  
  by Retired Chemist ( 5039029 ) writes:
  
  The bar exam is normally a function of the state bar association not the state government. No body owns anything. You have to pass the exam to practice law in the state. This actually dates back to the early days of the country when many lawyers got their education by apprenticeship rather than at a school. It is supposed to insure that, if you hire a lawyer, they will know what they are doing. You can take the exam multiple times (apparently few people pass on the first try in most states). Of cours
So? (Score:4, Insightful)

by Ksevio ( 865461 ) writes: on Wednesday April 23, 2025 @07:19PM (#65326679) Homepage

What exactly is the issue here? If the questions were vetted by professionals and found to be appropriate, does it matter if it was an expert human or machine or cat walking across the keyboard that wrote them to begin with?

- Re: (Score:1)
  
  by sabbede ( 2678435 ) writes:
  
  No, it doesn't. I suspect a lack of sincerity in the criticism.
Why, what's the point? (Score:3)

by az-saguaro ( 1231754 ) writes: on Wednesday April 23, 2025 @08:39PM (#65326787)

There are responses above like these :
jenningsthecat
Because of the subsequent review by qualified humans, I don't see this as a problem.
Ksevio
What exactly is the issue here? If the questions were vetted by professionals and found to be appropriate, does it matter if it was an expert human or machine or cat walking across the keyboard that wrote them to begin with?
That seems fair enough, that a human test writer used something to help write the question, as long as it was properly reviewed.
What I don't understand is why they would do so?
Per the article, these were usual multiple choice questions.
Meaning, a question writer has to think of a pertinent question or issue, with one valid answer, then have fun fudging 4 fake answers.
The question writer is presumably someone expert in the subject to begin with, so the questions subject matter is of intimate familiarity, making it easy to write at least the gist of the question plus its right answer.
I have participated in this process for board exams.
Fine tuning the question to be clear, eliminate ambiguity, etc., takes some time and effort. But, I do not see how AI can best the human writer who is the expert.
If the rejoinder is that AI was used to do the fine tuning, that makes even less sense in this case, since lawyers, especially those successful enough to be invited to write questions, are literate, capable of clear analytical writing, that's what they do for a living.
So, how does AI help?
I get the sense that this is another case of "AI makes people stupid", not post facto by having used it, but ante facto by thinking they have to try it out, even for something it doesn't apply to, because - you know - AI - everyone's doing it, jump on the bandwagon.
If they are using AI simply to annotate a question, such as supportive history or case law, then okay, but that's just "Search" by a fancy name. I could see where this might be a good exercise for those first year law students, to explore concepts new to them, a learning and study tool. But for the senior certified experts who get invited to write questions, it is BS, not AI.

- Re: Why, what's the point? (Score:3)
  
  by St.Creed ( 853824 ) writes:
  
  As someone who recently created an assessment, I've used AI too. What happens is that I can easily come up with a correct answer, but find it harder to come up with plausible incorrect ones. AI can give you a large set of plausibly incorrect answers to choose from.
  Or, in some cases, I need a few junior level questions as filler for statistical reasons, and lack inspiration. I can give the AI a list of our current questions, the topics we like to address, and let the AI suggest more topics and questions.
  We a
  - Re: (Score:2)
    
    by az-saguaro ( 1231754 ) writes:
    
    Interesting. Thanks for providing some positive context.
It's all about the Benjamins (Score:2)

by TheMiddleRoad ( 1153113 ) writes:

"Hey, we want to spend less on making the BAR exam."
"We'll do it for cheaper. Don't ask how."
"Oh, cheaper you say? Sounds good to me."
These are the people in charge. The people in charge are both evil and incompetent.
Lawyers suing lawyers (Score:2)

by Required Snark ( 1702878 ) writes:

Bring a BIG bowl of popcorn!
What's the problem? (Score:3)

by jvkjvk ( 102057 ) writes: on Thursday April 24, 2025 @05:41AM (#65327291)

They said the questions were fully vetted by experts afterwards, so there should be no issue.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

First year law student exam (Score:3)

Re:First year law student exam (Score:5, Insightful)

Re: (Score:2)

Re: First year law student exam (Score:3)

Re:First year law student exam (Score:4, Insightful)

Re: (Score:2)

"No, put your hand in the OPEN side of the box." (Score:2)

Re: (Score:1)

Re: "No, put your hand in the OPEN side of the box (Score:3)

Re: (Score:1)

Re: "No, put your hand in the OPEN side of the bo (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

WTAF (Score:2)

Dune, shields and lasguns (Score:2)

Re: (Score:2)

Tempest in a teapot? (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Would they have counted? (Score:5, Interesting)

Re:Would they have counted? (Score:4, Insightful)

Re: (Score:2)

So, in CA (Score:1)

Re: So, in CA (Score:2)

Re: (Score:2)

Re: (Score:2)

So? (Score:4, Insightful)

Re: (Score:1)

Why, what's the point? (Score:3)

Re: Why, what's the point? (Score:3)

Re: (Score:2)

It's all about the Benjamins (Score:2)

Lawyers suing lawyers (Score:2)

What's the problem? (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals