We have to stop ignoring AI’s hallucination problem : technology

[–] [email protected] 200 points 9 months ago (23 children)

We not only have to stop ignoring the problem, we need to be absolutely clear about what the problem is.

LLMs don't hallucinate wrong answers. They hallucinate all answers. Some of those answers will happen to be right.

If this sounds like nitpicking or quibbling over verbiage, it's not. This is really, really important to understand. LLMs exist within a hallucinatory false reality. They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.

That is the part that's crucial to understand. A really simple test of this problem is to ask ChatGPT to back up an answer with sources. It fundamentally cannot do it, because it has no ability to actually comprehend and correlate factual information in that way. This means, for example, that AI is incapable of assessing the potential veracity of the information it gives you. A human can say "That's a little outside of my area of expertise," but an LLM cannot. It can only be coded with hard blocks in response to certain keywords to cut it from answering and insert a stock response.

This distinction, that AI is always hallucinating, is important because of stuff like this:

But notice how Reid said there was a balance? That’s because a lot of AI researchers don’t actually think hallucinations can be solved. A study out of the National University of Singapore suggested that hallucinations are an inevitable outcome of all large language models. **Just as no person is 100 percent right all the time, neither are these computers. **

That is some fucking toxic shit right there. Treating the fallibility of LLMs as analogous to the fallibility of humans is a huge, huge false equivalence. Humans can be wrong, but we're wrong in ways that allow us the capacity to grow and learn. Even when we are wrong about things, we can often learn from how we are wrong. There's a structure to how humans learn and process information that allows us to interrogate our failures and adjust for them.

When an LLM is wrong, we just have to force it to keep rolling the dice until it's right. It cannot explain its reasoning. It cannot provide proof of work. I work in a field where I often have to direct the efforts of people who know more about specific subjects than I do, and part of how you do that is you get people to explain their reasoning, and you go back and forth testing propositions and arguments with them. You say "I want this, what are the specific challenges involved in doing it?" They tell you it's really hard, you ask them why. They break things down for you, and together you find solutions. With an LLM, if you ask it why something works the way it does, it will commit to the bit and proceed to hallucinate false facts and false premises to support its false answer, because it's not operating in the same reality you are, nor does it have any conception of reality in the first place.

[–] dustyData 50 points 9 months ago (11 children)

This right here is also the reason why AI fanboys get angry when they are told that LLMs are not intelligent or even thinking at all. They don't understand that in order for rational intelligence to exist, the LLMs should be able to have an internal, referential inner world of symbols, to contrast external input (training data) against and that is also capable of changing and molding to reality and truth criteria. No, tokens are not what I'm talking about. I'm talking about an internally consistent and persistent representation of the world. An identity, which is currently antithetical with the information model used to train LLMs. Let me try to illustrate.

I don't remember the details or technical terms but essentially, animal intelligence needs to experience a lot of things first hand in order to create an individualized model of the world which is used to direct behavior (language is just one form of behavior after all). This is very slow and labor intensive, but it means that animals are extremely good, when they get good, at adapting said skills to a messy reality. LLMs are transactional, they rely entirely on the correlation of patterns of input to itself. As a result they don't need years of experience, like humans for example, to develop skilled intelligent responses. They can do it in hours of sensing training input instead. But at the same time, they can never be certain of their results, and when faced with reality, they crumble because it's harder for it to adapt intelligently and effectively to the mess of reality.

LLMs are a solipsism experiment. A child is locked in a dark cave with nothing but a dim light and millions of pages of text, assume immortality and no need for food or water. As there is nothing else to do but look at the text they eventually develop the ability to understand how the symbols marked on the text relate to each other, how they are usually and typically assembled one next to the other. One day, a slit on a wall opens and the person receives a piece of paper with a prompt, a pencil and a blank page. Out of boredom, the person looks at the prompt, it recognizes the symbols and the pattern, and starts assembling the symbols on the blank page with the pencil. They are just trying to continue from the prompt what they think would typically follow or should follow afterwards. The slit in the wall opens again, and the person intuitively pushes the paper it just wrote into the slit.

For the people outside the cave, leaving prompts and receiving the novel piece of paper, it would look like an intelligent linguistic construction, it is grammatically correct, the sentences are correctly punctuated and structured. The words even make sense and it says intelligent things in accordance to the training text left inside and the prompt given. But once in a while it seems to hallucinate weird passages. They miss the point that, it is not hallucinating, it just has no sense of reality. Their reality is just the text. When the cave is opened and the person trapped inside is left into the light of the world, it would still be profoundly ignorant about it. When given the word sun, written on a piece of paper, they would have no idea that the word refers to the bright burning ball of gas above them. It would know the word, it would know how it is usually used to assemble text next to other words. But it won't know what it is.

LLMs are just like that, they just aren't actually intelligent as the person in this mental experiment. Because there's no way, currently, for these LLMs to actually sense and correlate the real world, or several sources of sensors into a mentalese internal model. This is currently the crux and the biggest problem on the field of AI as I understand it.

[–] Aceticon 12 points 9 months ago (1 children)

That's an excellent methaphor for LLMs.

[–] feedum_sneedson 19 points 9 months ago (2 children)

It's the Chinese room thought experiment.

load more comments (2 replies)

load more comments (10 replies)

[–] snek 30 points 9 months ago (7 children)

I fucking hate how OpenAi and other such companies claim their models "understand" language or are "fluent" in French. These are human attributes. Unless they made a synthetic brain, they can take these claims and shove them up their square tight corporate behinds.

load more comments (7 replies)

[–] [email protected] 11 points 9 months ago (2 children)

They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.

Which can be beautifully exploited with sponsored content.

See Google I/O '24.

load more comments (2 replies)

load more comments (20 replies)

[–] [email protected] 158 points 9 months ago (14 children)

"We invented a new kind of calculator. It usually returns the correct value for the mathematics you asked it to evaluate! But sometimes it makes up wrong answers for reasons we don't understand. So if it's important to you that you know the actual answer, you should always use a second, better calculator to check our work."

Then what is the point of this new calculator?

Fantastic comment, from the article.

[–] lateraltwo 20 points 9 months ago (1 children)

It's a nascent stage technology that reflects the world's words back at you in statistical order by way parsing user generated prompts. It's a reactive system with no autonomy to deviate from a template upon reset. It's no Rokos Basilisk inherently, just because

[–] tourist 13 points 9 months ago (4 children)

am I understanding correctly that it's just a fancy random word generator

load more comments (4 replies)

[–] CaptainSpaceman 17 points 9 months ago (4 children)

Its not just a calculator though.

Image generation requires no fact checking whatsoever, and some of the tools can do it well.

That said, LLMs will always have limitations and true AI is still a ways away.

[–] [email protected] 17 points 9 months ago

The biggest disappointment in the image generation capabilities was the realisation that there is no object permanence there in terms of components making up an image so for any specificity you're just playing whackamole with iterations that introduce other undesirable shit no matter how specific you make your prompts.

They are also now heavily nerfing the models to avoid lawsuits by just ignoring anything relating to specific styles that may be considered trademarks, problem is those are often industry jargon so now you're having to craft more convoluted prompts and get more mid results.

[–] [email protected] 12 points 9 months ago

It does require fact-checking. You might ask a human and get someone with 10 fingers on one hand, you might ask people in the background and get blobs merged on each other. The fact check in images is absolutely necessary and consists of verifying that the generate image adheres to your prompt and that the objects in it match their intended real counterparts.

I do agree that it's a different type of fact checking, but that's because an image is not inherently correct or wrong, it only is if compared to your prompt and (where applicable) to reality.

load more comments (2 replies)

[–] elephantium 14 points 9 months ago (1 children)

Some problems lend themselves to "guess-and-check" approaches. This calculator is great at guessing, and it's usually "close enough".

The other calculator can check efficiently, but it can't solve the original problem.

Essentially this is the entire motivation for numerical methods.

load more comments (1 replies)

load more comments (11 replies)

[–] [email protected] 95 points 9 months ago

Altman going "yeah we could make it get things right 100% of the time, but that would be boring" has such "my girlfriend goes to another school" energy it's not even funny.

[–] [email protected] 61 points 9 months ago (5 children)

Who's ignoring hallucinations? It gets brought up in basically every conversation about LLMs.

[–] [email protected] 80 points 9 months ago (3 children)

People who suggest, let's say, firing employees of crisis intervention hotline and replacing them with llms...

[–] SkyezOpen 24 points 9 months ago

"Have you considered doing a flip as you leap off the building? That way your death is super memorable and cool, even if your life wasn't."

-Crisis hotline LLM, probably.

[–] [email protected] 17 points 9 months ago (1 children)

Less horrifying conceptually, but in Canada a major airline tried to replace their support services with a chatbot. The chatbot then invented discounts that didn't actually exist, and the courts ruled that the airline had to honour them. The chatbot was, for all intents and purposes, no more or less official a source of data than any other information they put out, such as their website and other documentation.

load more comments (1 replies)

[–] [email protected] 12 points 9 months ago (1 children)

The part that's being ignored is that it's a problem, not the existence of the hallucinations themselves. Currently a lot of enthusiasts are just brushing it off with the equivalent of ~~boys will be boys~~ AIs will be AIs, which is fine until an AI, say, gets someone jailed by providing garbage caselaw citations.

And, um, you're greatly overestimating what someone like my technophobic mother knows about AI ( xkcd 2501: Average Familiarity seems apropos). There are a lot of people out there who never get into a conversation about LLMs.

load more comments (1 replies)

load more comments (3 replies)

[–] [email protected] 56 points 9 months ago (9 children)

I'm a bit annoyed at all the people being pedantic about the term hallucinate.

Programmers use preexisting concepts as allegory for computer concepts all the time.

Your file isn't really a file, your desktop isn't a desk, your recycling bin isn't a recycling bin.

[Insert the entirety of Object Oriented Programming here]

Neural networks aren't really neurons, genetic algorithms isn't really genetics, and the LLM isn't really hallucinating.

But it easily conveys what the bug is. It only personifies the LLM because the English language almost always personifies the subject. The moment you apply a verb on an object you imply it performed an action, unless you limit yourself to esoteric words/acronyms or you use several words to overexplain everytime.

[–] calcopiritus 14 points 9 months ago* (last edited 9 months ago) (2 children)

It's easily the worst problem of Lemmy. Sometimes one guy has an issue with something and suddenly the whole thread is about that thing, as if everyone thought about it. No, you didn't think about it, you just read another person's comment and made another one instead of replying to it.

I never heard anyone complain about the term "hallucination" for AIs, but suddenly in this one thread there are 100 clonic comments instead of a single upvoted ones.

I get it, you don't like "hallucinate", just upvote the existing comment about it and move on. If you have anything to add, reply to that comment.

I don't know why this specific thing is so common on Lemmy though, I don't think it happened in reddit.

load more comments (2 replies)

load more comments (8 replies)

[–] lectricleopard 44 points 9 months ago (2 children)

The Chinese Room thought experiment is a good place to start the conversation. AI isn't intelligent, and it doesn't hallucinate. Its not sentient. It's just a computer program.

People need to stop using personifying language for this stuff.

[–] TubularTittyFrog 16 points 9 months ago (1 children)

that's not fun and dramatic and clickbaity though

load more comments (1 replies)

[–] ClamDrinker 37 points 9 months ago* (last edited 9 months ago) (19 children)

It will never be solved. Even the greatest hypothetical super intelligence is limited by what it can observe and process. Omniscience doesn't exist in the physical world. Humans hallucinate too - all the time. It's just that our approximations are usually correct, and then we don't call it a hallucination anymore. But realistically, the signals coming from our feet take longer to process than those from our eyes, so our brain has to predict information to create the experience. It's also why we don't notice our blinks, or why we don't see the blind spot our eyes have.

AI representing a more primitive version of our brains will hallucinate far more, especially because it cannot verify anything in the real world and is limited by the data it has been given, which it has to treat as ultimate truth. The mistake was trying to turn AI into a source of truth.

Hallucinations shouldn't be treated like a bug. They are a feature - just not one the big tech companies wanted.

When humans hallucinate on purpose (and not due to illness), we get imagination and dreams; fuel for fiction, but not for reality.

[–] [email protected] 12 points 9 months ago (5 children)

I think you're giving a glorified encyclopedia too much credit. The difference between us and "AI" is that we can approach knowledge from a problem solving position. We do approximate the laws of physics, but we don't blindly take our beliefs and run with it. We put we come up with a theory that then gets rigorously criticized, then come up with ways to test that theory, then be critical of the test results and eventually we come to consensus that based on our understandings that thing is true. We've built entire frameworks to reduce our "hallucinations". The reason we even know we have blind spots is because we're so critical of our own "hallucinations" that we end up deliberately looking for our blind spots.

But the "AI" doesn't do that. It can't do that. The "AI" can't solve problems, it can't be critical of itself or what information its giving out. All our current "AI" can do is word vomit itself into a reasonable answer. Sometimes the word vomit is factually correct, sometimes it's just nonsense.

You are right that theoretically hallucinations cannot be solved, but in practicality we ourselves have come up with solutions to minimize it. We could probably do something similar with "AI" but not when the AI is just a LLM that fumbles into sentences.

load more comments (5 replies)

load more comments (18 replies)

[–] [email protected] 37 points 9 months ago (4 children)

The simple solution is not to rely upon AI. It's like a misinformed relative after a jar of moonshine, they might be right some of the time, or they might be totally full of shit.

I honestly don't know why people are obsessed with relying on AI, is it that difficult to look up the answer from a reliable source?

load more comments (4 replies)

[–] [email protected] 34 points 9 months ago (10 children)

Honestly I feel people are using them completely wrong.

Their real power is their ability to understand language and context.

Turning natural language input into commands that can be executed by a traditional software system is a huge deal.

Microsoft released an AI powered auto complete text box and it's genius.

Currently you have to type an exact text match in an auto complete box. So if you type cats but the item is called pets you'll get no results. Now the ai can find context based matches in the auto complete list.

This is their real power.

Also they're amazing at generating non factual based things. Stories, poems etc.

[–] [email protected] 54 points 9 months ago (2 children)

Their real power is their ability to understand language and context.

...they do exactly none of that.

[–] breakingcups 24 points 9 months ago (1 children)

No, but they approximate it. Which is fine for most use cases the person you're responding to described.

[–] [email protected] 20 points 9 months ago (1 children)

They're really, really bad at context. The main failure case isn't making things up, it's having text or image in part of the result not work right with text or image in another part because they can't even manage context across their own replies.

See images with three hands, where bow strings mysteriously vanish etc.

load more comments (1 replies)

[–] Blue_Morpho 26 points 9 months ago (2 children)

So if you type cats but the item is called pets get no results. Now the ai can find context based matches in the auto complete list.

Google added context search to Gmail and it's infuriating. I'm looking for an exact phrase that I even put in quotes but Gmail returns a long list of emails that are vaguely related to the search word.

load more comments (2 replies)

[–] hedgehogging_the_bed 13 points 9 months ago (2 children)

Searching with synonym matching is almost.decades old at this point. I worked on it as an undergrad in the early 2000s.and it wasn't new then, just complicated. Google's version improved over other search algorithms for a long time.and then trashed it by letting AI take over.

load more comments (2 replies)

load more comments (6 replies)

[–] [email protected] 32 points 9 months ago (5 children)

it's only going to get worse, especially as datasets deteriorate.

With things like reddit being overrun by AI, and also selling AI training data, i can only imagine what mess that's going to cause.

load more comments (5 replies)

[–] Hugin 21 points 9 months ago (5 children)

Prisencolinensinainciusol an Italian song that is complete gibberish but made to sound like an English language song. That's what AI is right now.

https://www.youtube.com/watch?v=RObuKTeHoxo

load more comments (5 replies)

[–] [email protected] 17 points 9 months ago

Yeah! Just like water's "wetness" problem. It's kinda fundamental to how the tech operates.

[–] SulaymanF 17 points 9 months ago

We also have to stop calling it hallucinations. The proper term in psychology for making stuff up like this is “Confabulations.”

[–] [email protected] 15 points 9 months ago (6 children)

Why do tech journalists keep using the businesses' language about AI, such as "hallucination", instead of glitching/bugging/breaking?

[–] superminerJG 43 points 9 months ago (4 children)

hallucination refers to a specific bug (AI confidently BSing) rather than all bugs as a whole

[–] [email protected] 17 points 9 months ago

Honestly, it's the most human you'll ever see it act.

It's got upper management written all over it.

load more comments (3 replies)

[–] Danksy 36 points 9 months ago (5 children)

It's not a bug, it's a natural consequence of the methodology. A language model won't always be correct when it doesn't know what it is saying.

load more comments (5 replies)

[–] machinin 20 points 9 months ago (1 children)

https://en.m.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

The term "hallucinations" originally came from computer researchers working with image producing AI systems. I think you might be hallucinating yourself 😉

load more comments (1 replies)

[–] [email protected] 20 points 9 months ago (2 children)

Because hallucinations pretty much exactly describes what's happening? All of your suggested terms are less descriptive of what the issue is.

The definition of hallucination:

A hallucination is a perception in the absence of an external stimulus.

In the case of generative AI, it's generating output that doesn't match it's training data "stimulus". Or in other words, false statements, or "facts" that don't exist in reality.

load more comments (2 replies)

Technology

Our Rules

Approved Bots