Technology

62105 readers

6043 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

183

An indepth explanation of how LLMs work with an minimum of jargon (open.substack.com)

submitted 2 years ago by [email protected] to c/technology

17 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] lily33 5 points 2 years ago* (last edited 2 years ago)

It's not it's biological origins that make it hard to understand the brain, but the complexity. For example, we understand how the heart works pretty well.

While LLMs are nowhere near as complex as a brain, they're complex enough to make it extremely difficult to understand.

But then there comes the question: if they're so difficult to understand, how did people make them in the first place?

The way they did it actually bears some similarities to evolution. They created an "empty" model - a large neural network that wasn't doing anything useful or meaningful. But it depended on billions of parameters, and if you tweak a parameter, its behavior changes slightly.

Then they expended enormous amount of computing power tweaking parameters, each tweak slightly improving its ability to model language. While doing this, they didn't know what each number meant. They didn't know how or why each tweak was improving the model. Just that each tweak was making an improvement.

Unlike evolution, each tweak isn't random. There's an algorithm called back-propagation that can tell you how to tweak the neural network to make it predict some known data slightly better. But unfortunately it doesn't tell you anything about the "why" this tweak is good, or "what" each parameter change means. Hence why we don't understand how LLMs work.

One final clarification: It's not a complete black box. We do have some understanding of how LLM works, mostly on high level. Kind of like we have some basic understanding of how a brain works. We understand LLMs much better than brains, of course.