Technology

62111 readers

6547 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

183

An indepth explanation of how LLMs work with an minimum of jargon (open.substack.com)

submitted 2 years ago by [email protected] to c/technology

17 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Dark_Blade 6 points 2 years ago (1 children)

From my extremely limited understanding, it’s because of the sheer scale of the data that’s been fed into LLMs, and because of the (admittedly small) possibility that the people working on LLMs never really took the time to understand what sort of connections the LLM was making between all the datapoints it was interacting with and drawing connections between, or at least a deeper understanding of how the math worked; just that it did.

As the scale of the project kept growing and LLM companies just kept throwing ‘more data, more neural networks, more hardware!’ into the mix, the black box became…well, blacker and it kept getting harder to figure out the internal ‘logic’ used by the LLM to predict the next word. Now, the people who’re trying to figure it all out are working with extremely large amounts of data with nothing to go off of.

In short, the people making GPT were somehow smart enough to make it, but not smart enough to understand what they were making.

[–] lily33 5 points 2 years ago* (last edited 2 years ago)

It's not that nobody took the time to understand. Researchers have been trying to "un-blackbox" neural networks pretty much since those have been around. It's just an extremely complex problem.

Logistic regression (which is like a neural network but with just one node) is pretty well understood - but even then sometimes it can learn some pretty unintuitive coefficients and it can be tricky to understand why.

With LLMs - which are enormous by comparison - it's simply not a tractable problem to understand how it works in detail.