this post was submitted on 14 Jun 2024
42 points (92.0% liked)

Technology

55440 readers
5613 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Neural networks have become increasingly impressive in recent years, but there's a big catch: we don't really know what they are doing. We give them data and ways to get feedback, and somehow, they learn all kinds of tasks. It would be really useful, especially for safety purposes, to understand what they have learned and how they work after they've been trained. The ultimate goal is not only to understand in broad strokes what they're doing but to precisely reverse engineer the algorithms encoded in their parameters. This is the ambitious goal of mechanistic interpretability. As an introduction to this field, we show how researchers have been able to partly reverse-engineer how InceptionV1, a convolutional neural network, recognizes images.

you are viewing a single comment's thread
view the rest of the comments
[–] QuarterSwede 5 points 1 week ago* (last edited 1 week ago) (1 children)

They absolutely do not learn and we absolutely do know how they work. It’s pretty simple.

Generative AI needs massive training sets that represent the kinds of things it’s asked to represent. Through the process of training, the AI learns the patterns in the data and can generate new data that fits within those patterns. It’s statistics all the way down. In the case of a Large Language Model (LLM) it’s always asking itself, “what’s the next most likely word to come after this previous word, and does that next word make sense within the context of the other words in the sentence?” The LLMs don’t necessarily understand a text as a text; that is, as a sequence of ideas unfolding logically but rather as a set of tokens that carry statistical weights.

https://jasonheppler.org/2024/05/23/i-made-this/

[–] GamingChairModel 2 points 1 week ago

Yes, but the tokens are more than just a stream of letters, and aren't saved in the form of words. The information itself is organized into conceptual proximity to other concepts (and distinct from the text itself), and weighted in a way consistent with its training.

That's why these models can use analogies and metaphors in a persuasive way, in certain contexts. Mix concepts that the training data has never been shown before, and these LLMs can still output something consistent with those concepts.

Anthropic played around with their own model, emphasizing or deemphasizng particular concepts to observe some unexpected behavior.

And we'd have trouble saying whether a model "knows" something if we don't have a robust definition of when and whether a human brain "knows" something.