Lemmy.World

166,091 readers
8,300 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules ⚖ https://legal.lemmy.world/tos

Get started

See the Getting Started Guide

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Liberapay patrons

GitHub Sponsors

Join the team 😎

Check out our team page to join

Questions / Issues

More Lemmy.World

Follow us for server news 🐘

Mastodon Follow

Chat 🗨

Discord

Matrix

Alternative UIs

Monitoring / Stats 🌐

Service Status 🔥

https://status.lemmy.world

Mozilla HTTP Observatory Grade

Lemmy.World is part of the FediHosting Foundation

founded 1 year ago
ADMINS
1
 
 

High-profile A.I. chatbot ChatGPT performed worse on certain tasks in June than its March version, a Stanford University study found.

The study compared the performance of the chatbot, created by OpenAI, over several months at four “diverse” tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning.

Researchers found wild fluctuations—called drift—in the technology’s ability to perform certain tasks. The study looked at two versions of OpenAI’s technology over the time period: a version called GPT-3.5 and another known as GPT-4. The most notable results came from research into GPT-4’s ability to solve math problems. Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time.

2
view more: next ›