this post was submitted on 01 Dec 2024
46 points (88.3% liked)

Futurology

1823 readers
121 users here now

founded 1 year ago
MODERATORS
top 27 comments
sorted by: hot top controversial new old
[–] [email protected] 34 points 2 days ago (2 children)

Great, so it's still wrong 1 out of 20 times, and just got even more energy intensive to run.

[–] kippinitreal 8 points 2 days ago (3 children)

Genuine question: how energy intensive is it to run a model compared to training it? I always thought once a model is trained it's (comparatively) trivial to query?

[–] [email protected] 7 points 2 days ago (1 children)

A 100-word email generated by an AI chatbot using GPT-4 requires 0.14 kilowatt-hours (kWh) of electricity, equal to powering 14 LED light bulbs for 1 hour.

Source: https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers/

[–] [email protected] 0 points 1 day ago

How much energy does it take for the PC to be on and the user to type out that email manually?

I assume we will get to a point where energy required starts to reduce as the computing power increases with moores law. However, it's awful for the environment in the mean time.

I don't doub that rather than reducing energy, instead they will use more complex models requiring more power for these tasks for the foreseeable future. However eventually it will be diminishing returns on power and efficiency will be more profitable.

[–] [email protected] 6 points 2 days ago (2 children)

For the small ones, with GPUs a couple hundred watts when generating. For the large ones, somewhere between 10 to 100 times that.

With specialty hardware maybe 10x less.

[–] pennomi 3 points 2 days ago (2 children)

A lot of the smaller LLMs don’t require GPU at all - they run just fine on a normal consumer CPU.

[–] [email protected] 1 points 1 day ago

yeah but 10x slower, at speeds that just don't work for many use cases. When you compare energy consumption per token, there isn't much difference.

[–] [email protected] 3 points 2 days ago (1 children)

Wouldn't running on a CPU (while possible) make it less energy efficient, though?

[–] pennomi 3 points 2 days ago

It depends. A lot of LLMs are memory-constrained. If you’re constantly thrashing the GPU memory it can be both slower and less efficient.

[–] kippinitreal 2 points 2 days ago

Good god. Thanks for the info.

[–] [email protected] 2 points 2 days ago

Still requires thirsty datacenters that use megawatts of power to keep them online and fast for thousands of concurrent users

[–] [email protected] 3 points 2 days ago (2 children)

I wonder how that compares to the average human?

[–] dustyData 8 points 2 days ago (2 children)

Not a very good, or easy comparison to make. Against the average, sure, the AI is above the average. But a domain expert like a doctor or an accountant is way much more accurate than that. In the 99+% range. Sure, everyone makes mistakes. But when we are good at something, we are really good.

Anyways this is just a ridiculous amount of effort and energy wasted just to reduce hallucinations to 4.4%.

[–] [email protected] 4 points 2 days ago* (last edited 2 days ago)

It's also notable that human error tends to occur in predictable ways which can be prepared for and noticed much more easily, while machine errors tend to be entirely random and unpredictable. For example: When a human makes a judgment on a medical issue which poses a very significant risk to the patient, they will generally put more effort into ensuring an accurate result/pay more attention to what they're doing.

[–] [email protected] 3 points 2 days ago (2 children)

But a domain expert like a doctor or an accountant is way much more accurate

Actually, not so.

If the AI is trained on narrow data sets, then it beats humans. There's quite a few examples of this recently with different types of medical expertise.

[–] dustyData 7 points 2 days ago* (last edited 2 days ago) (2 children)

Cool, where are the papers?

[–] [email protected] 10 points 2 days ago

"We just need to drain a couple of lakes more and I promise bro you'll see the papers."

I work in the field and I've seen tons of programs dedicated to use AI on healthcare and except for data analytics (data science) or computer image, everything ends in a nothing-burger with cheese that someone can put on their website and call the press.

LLMs are not good for decision making (and unless there is a real paradigm shift) they won't ever be due to their statistical nature.

The biggest pitfall we have right now is that LLMs are super expensive to train and maintain as a service and companies are pushing them hard promising future features that, by most of the research community they won't ever reach (as they have plateaued): Will we run out of data? Limits of LLM scaling based on human-generated data Large Language Models: a Survey (2024) No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

And for those that don't want to read papers on a weekend, there was a nice episode of computerphile 'ere: https://youtu.be/dDUC-LqVrPU

</end of rant>

[–] [email protected] -2 points 2 days ago (1 children)
[–] [email protected] 6 points 2 days ago

Are you kidding me? How did NYT reach those conclusions when the chair flipping conclusions of said study quite clearly states that [sic]"The use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources."

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

I mean, c'mon!

On the Nature one:

"we constructed a new forward-looking (Fig. 2) benchmark, BrainBench."

and

"Instead, our analyses suggested that LLMs discovered the fundamental patterns that underlie neuroscience studies, which enabled LLMs to predict the outcomes of studies that were novel to them."

and

"We found that LLMs outperform human experts on BrainBench"

Is in reality saying: we made this benchmark that LLMs know how to cheat around our benchmark better than experts do, nothing more, nothing else.

[–] BluesF 4 points 2 days ago

Specialized ML models yes, not LLMs to my knowledge, but happy to be proved wrong.

[–] [email protected] 7 points 2 days ago

I would not accept a calculator being wrong even 1% of the time.

AI should be held to a higher standard than "it's on average correct more often than a human".

[–] [email protected] 12 points 2 days ago

Congratulations to AI researchers on discovering the benefits of peer review?

[–] [email protected] 7 points 2 days ago

LLM’s will never achieve much higher than that simply because there’s no reasoning behind it. It. Won’t. Work. Ever.

[–] [email protected] 3 points 2 days ago (1 children)

Sounds like Legion from Mass Effect

[–] SidewaysHighways 2 points 2 days ago

Acknowledged, we have reached consensus.

[–] [email protected] 3 points 2 days ago

Even a 99% success rate wouldn't be great with a high enough volume. If you curate 1,000,000 pieces of information with a 99% success rate, you're still going to make 10,000 mistakes.

[–] [email protected] 3 points 2 days ago* (last edited 2 days ago)

I still see even the more advanced AIs make simple errors on facts all the time....