ChatGPT’s Hallucinations Deserve Some Stricter Scrutiny

Published in

misaligned

4 min readJun 11, 2023

Last week I listened to “Opening Arguments”¹ — one of of the podcasts I listen to for leisure sometimes — and learned about Mata v Avianca², a case in a Manhattan court where lawyers were caught to have submitted made-up citations generated (so they claim) by ChatGPT. There is a lot of more to that case which exceeds the scope of my blog.

Simon Willison over at their blog³ has done some deeper digging into what has been claimed to have happened there, which I recommend to read. I have recently started to look a bit more into AI hallucinations⁴, so I run a little experiment: I took one of the cited faked cases in Mata v Aviance and discussed it with ChatGPT.

Disclaimer: This is not a proper study, it is a rather casual test and the conclusion reflects my spontaneous thoughts on the outcome.

My little experiment

So I wanted to start out simple and used the same basic question the accused lawyers claim to have used: “Is Varghese a real case?”.

Ok, so it looks like there no well-known case under that name. So I decided to pretend to be an expert and use the full (fake) info which is one of the non-existing cases used by the lawyers cited in their court submissions, a full made-up case name, court and case number.

Step 2: I give more “context”, and suddenly ChatGPT pretends to know what I mean.

I am not sure what “confusion” ChatGPT refers to, but it appears to be apologetic for not having known “Varghese” before. That is fair enough. What I learn here is that there was apparently a case called “Varghese vs Patel” and that it was a “real case decided in the 11th Circuit Court of Appeals”. That’s not the non-existent case I had in mind, but fair.

What confused me though is why ChatGPT says my question “indicates” it’s a federal appelate court decision. But maybe there is a miss match between the case number I cited and the “Varghese v Patel” ChatGPT has found.

But I am curious: So I asked for some more details about “Varghese vs Patel”, the case it has dug up:

Wait? I never claimed that “Varghese v Patel” has that specific case number. And it was ChatGPT which claimed there is such a case in the 11th circuit Court. “There might be some misunderstanding”? Certainly.

I also never mentioned “Patel”, it was ChatGPT who did. So lets go back to a very basic qestion: Is there a case called “Varghese v Patel” or not?

Step 4: We have arrived at the point of denial.

Now ChatGPT does a total retreat. Despite having claimed that — remember — “Varghese v Patel” is a “real case that was decided in the 11th circuit”, now it doesn’t seem to be aware of its existance. Let’s summarise what happened here:

I ask about a case in “basic format” (single name), and ChatGPT can’t give any information.
When my question becomes more detailed and suggestive, ChatGPT agrees with me that there is a “real case decided”, and suggests a full case name?
ChatGPT peddles back and denies knowledge about the case it suggested.

On first sight, it appears that it was my very specific and suggestive question which triggered the hallucination.

the more of a subject matter expert you are, the more likely ChatGPT gives in to your suggestion

This could also explain why a lawyer, who certainly phrases their question in a more specific manner than a legal amateur like me — may have run into this issue. The hallucination in this case is small, but if I had not insisted on more info, I would have certainly taken “a real case decided” as a correct answer.

However if my — admittingly superficial — assumption is correct, it would indicate that the more of a subject matter expert you are, the more likely ChatGPT gives in to your suggestion and presents made up information. If that is so, this certainly deserves more scrutiny.

¹ ChatGPT Writes Fake Opinions — Opening Arguments

² Mata v Avianca Docket

³ Lawyer cites faked cases invented by ChatGPT, judge is not amused — Simon Willison (Btw, I highly recommend this blog)

⁴ The term “AI hallucination” describes a “confident” response or result (i.e. “presented as correct”) by an AI which does not appear to be justified by its training data, i.e. seems to be “made up”.

misaligned

ChatGPT’s Hallucinations Deserve Some Stricter Scrutiny

My little experiment

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in misaligned

Written by Wolfgang Hauptfleisch

Responses (2)