this post was submitted on 11 Jun 2024

147 points (88.5% liked)

AI

3909 readers

31 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 3 years ago

147

‘Embarrassingly simple’ probe finds AI in medical image diagnosis ‘worse than random’ (venturebeat.com)

submitted 2 weeks ago by [email protected] to c/[email protected]

23 comments fedilink hide all child comments

top 23 comments

sorted by: hot top controversial new old

[–] [email protected] 56 points 2 weeks ago (2 children)

We have models that are specifically made to be good at these kinds of tasks. Why would you choose the ones that aren't and then make generalizing claims about how AI sucks in this domain?

[–] [email protected] 13 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Yeah this is probably just straight up misinformation. By no means is a diagnosis going to be made by a generalist multimodal LLM. Diagnosis is a literally a binary classification (although that is an oversimplification) and on medical CV you are optimizing on that directly.

[–] [email protected] -3 points 2 weeks ago* (last edited 2 weeks ago) (3 children)

They did not use a LLM.

In a recent experiment, they set out to determine how reliable LMMs are in medical diagnosis — asking both general and more specific diagnostic questions — as well as whether models were even being evaluated correctly for medical purposes.

Curating a new dataset and asking state-of-the-art models questions about X-rays, MRIs and CT scans of human abdomens, brain, spine and chests, they discovered “alarming” drops in performance.

[–] [email protected] 17 points 2 weeks ago (1 children)

You've quoted them stating they used LLMs while claiming they did not use a LLM? What am I missing here?

[–] [email protected] 6 points 2 weeks ago (1 children)

What am I missing here?

"L" "M" "M"

[–] [email protected] 4 points 2 weeks ago

Which in this context just means multimodal LLM, correct?

[–] Starbuck 10 points 2 weeks ago

models including GPT-4V and Gemini Pro

What a joke, a few generic LLMs making a judgement call about all AI models.

[–] [email protected] 2 points 2 weeks ago (1 children)

They used one to create the dataset for their experiments:

In their experiments, they introduced a new dataset, Probing Evaluation for Medical Diagnosis (ProbMed), for which they curated 6,303 images from two widely-used biomedical datasets. These featured X-ray, MRI and CT scans of multiple organs and areas including the abdomen, brain, chest and spine.

GPT-4 was then used to pull out metadata about existing abnormalities, the names of those conditions and their corresponding locations. This resulted in 57,132 question-answer pairs covering areas such as organ identification, abnormalities, clinical findings and reasoning around position.

[–] [email protected] 0 points 2 weeks ago (1 children)

The seven models tested included GPT-4V, Gemini Pro and the open-source, 7B parameter versions of LLaVAv1, LLaVA-v1.6, MiniGPT-v2, as well as specialized models LLaVA-Med and CheXagent. These were chosen because their computational costs, efficiencies and inference speeds make them practical in medical settings, researchers explain.

It seems like this is a case of "they just aren't using AI right, if they used it right it works" when it sure looks like they are using the models intended for these specific medical tasks.

[–] [email protected] 3 points 2 weeks ago* (last edited 2 weeks ago)

Those are not the sort of model anybody in the field would use (medical CV with deep learning based analysis is a vibrant field with many breakthroughs in recent years). These are the sort of models tech bros are trying to sell to the public as general AI. There is a world of difference.

[–] NocturnalEngineer 4 points 2 weeks ago

Not defending this article, but companies & big tech are generalizing the crap out of AI right now, and forcing it into everything.

They could have (and definitely should've) promoted the strengths and weaknesses of their models, specifically regarding what it can and can't do. But they don't. They get more money when their shareholders & customers think it's the next best thing for everything.

[–] [email protected] 35 points 2 weeks ago

As others have said, you don't need (and shouldn't use) a LLM for a classification task like this. There are machine learning models that can handle this and identify underlying patterns that humans can not easily detect. And yes, they can get accuracy and precision scores much higher than 50%

What an incredibly stupid article.

[–] [email protected] 33 points 2 weeks ago (2 children)

This is pretty dumb, machine learning algorithms (fuck off with calling it AI) are especially good at seeing signs of disease in data such as xrays, CT and MRI scans. It's the one place they really help save time and prevent mistakes. And even if it's just to flag shit for a second opinion by a doctor and not to replace the doctor, that's still super useful. Pattern recognition is hard and these kinds of algorithms are very good at them if provided the right source data to work off.

If only the media and big corps would stop claiming LLMs are general AI, then maybe people would stop using them for stuff it's clearly not good at and not meant for.

[–] [email protected] 10 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

This isn't dumb. This is a very good study as it is helping to remind people that these fancy new tools aren't good at everything. The media reporting on this is doing a service.

Edit: my bad making two responses

[–] [email protected] 3 points 2 weeks ago (1 children)

By casting doubt on a related but fundamentally different bit of medical tech? Yeah that's what we need: more folks questioning medicine based on pop science understandings of the technology.

[–] [email protected] 1 points 2 weeks ago

Good point

[–] [email protected] 5 points 2 weeks ago (2 children)

Can't stop people calling it AI. People have called video game bots AI since the 90s, even in industry. Any algorithm is a form of artificial intelligence, really. LLMs and machine vision are multipurpose, though I agree that general-purpose is still a stretch.

[–] [email protected] 4 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Why wouldn't agents in video games be AI, though? Things like are pathfinding, search, and behaviour trees are commonly used for agents in games, and in computer science these are widely considered to be artificial intelligence techniques. It's unlikely that you would find a CS textbook calling the Fast Fourier Transform AI though, or things like Bresenham's Line Drawing algorithm.

[–] [email protected] 2 points 2 weeks ago

Absolutely. I wouldn't call Bresenham AI. In some contexts, like games, I might call A* search AI. But to someone from the Victorian era who paid people to compute taylor series by hand, something basic and flexible like a microprocessor which can run bresenham or FFT or etc. etc. ... might have been seen as artificial intelligence. Using a machine to solve a problem that normally requires human brainpower.

[–] [email protected] 4 points 2 weeks ago

Seriously, the field of artificial intelligence has been around since the beginning of computer science, since Alan Turing founded it after coming up with the modern computer. Frankly, if you ask me, anyone complaining about LLMs being referred to as AI has been watching too many movies. AI != Human-but-metal and it never has. Going by the Wikipedia article, to be considered AI, a machine just has to perceive it's environment and learn - degree notwithstanding.

Of course this definition is pretty vague, so in practice AI tends to refer to the cutting edge of flexible computer algorithms. Many now-mundane algorithms much simpler than today's LLMs (like A* and genetic algorithms) were once considered AI for their flexible logic. At some point the Internet decided that it doesn't count unless it's literally Jarvis, but that's a very stingy definition of a very broad field.

[–] [email protected] 20 points 2 weeks ago

Coincidentally, I trained a CNN to tell dogs from cats and it does a godawful job diagnosing cancer

[–] pennomi 10 points 2 weeks ago (1 children)

LLMs are notorious yes-men. Why would you ever use that for diagnosis? Just use bespoke classifiers like we have for years.

[–] [email protected] 3 points 2 weeks ago

Because some researcher wanted to document what would happen and a journalist thought writing about that would get many clicks