Futurology

2150 readers

78 users here now

founded 2 years ago

MODERATORS

[email protected]

Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy. (arxiv.org)

submitted 3 months ago by [email protected] to c/[email protected]

27 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] dustyData 7 points 3 months ago* (last edited 3 months ago) (2 children)

Cool, where are the papers?

[–] [email protected] 10 points 3 months ago

"We just need to drain a couple of lakes more and I promise bro you'll see the papers."

I work in the field and I've seen tons of programs dedicated to use AI on healthcare and except for data analytics (data science) or computer image, everything ends in a nothing-burger with cheese that someone can put on their website and call the press.

LLMs are not good for decision making (and unless there is a real paradigm shift) they won't ever be due to their statistical nature.

The biggest pitfall we have right now is that LLMs are super expensive to train and maintain as a service and companies are pushing them hard promising future features that, by most of the research community they won't ever reach (as they have plateaued): Will we run out of data? Limits of LLM scaling based on human-generated data Large Language Models: a Survey (2024) No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

And for those that don't want to read papers on a weekend, there was a nice episode of computerphile 'ere: https://youtu.be/dDUC-LqVrPU

</end of rant>

[–] [email protected] -2 points 3 months ago (1 children)

Large language models surpass human experts in predicting neuroscience results

A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot.

[–] [email protected] 6 points 3 months ago

Are you kidding me? How did NYT reach those conclusions when the chair flipping conclusions of said study quite clearly states that [sic]"The use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources."

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

I mean, c'mon!

On the Nature one:

"we constructed a new forward-looking (Fig. 2) benchmark, BrainBench."

and

"Instead, our analyses suggested that LLMs discovered the fundamental patterns that underlie neuroscience studies, which enabled LLMs to predict the outcomes of studies that were novel to them."

and

"We found that LLMs outperform human experts on BrainBench"

Is in reality saying: we made this benchmark that LLMs know how to cheat around our benchmark better than experts do, nothing more, nothing else.