this post was submitted on 23 Jul 2023
39 points (95.3% liked)

Programming

17313 readers
166 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 18 points 1 year ago (2 children)

You know when your typing on your phone and you have that bar above the keyboard showing you what word it thinks you are writing? If you click the word before you finish typing it, it can even show you the word it thinks you are going to write next. Gpt works the same way, it just has waaaay more data that it can sample from.

It's all just very advanced predictive text algorithms.

Ask it a question about basketball. It looks through all documents it can find about basketball and sees often they reference, hoops, Michael Jordan, sneakers, NBA ect. And just outputs things that are highly referenced in a structure that makes grammatical sense.

For instance, if you had the word 'basketball' it knows it's very unlikely for the word before it to be 'radish' and it's more likely to be a word like 'the' or 'play' so it just strings it together logically.

That's the basics anyway.

[–] [email protected] 14 points 1 year ago (1 children)

Ask it a question about basketball. It looks through all documents it can find about basketball...

I get that this is a simplified explanation but want to add that this part can be misleading. The model doesn't contain the original documents and doesn't have internet access to look up the documents (though that can be added as an extra feature, but even then it's used more as a source to show humans than something for the model to learn from on the fly). The actual word associations are all learned during training, and during inference it just uses the stored weights. One implication of this is that the model doesn't know about anything that happened after its training data was collected.

[–] [email protected] 3 points 1 year ago (2 children)

I wonder what an ELI5 version of 'stored weights' would be in this context.

[–] [email protected] 2 points 1 year ago

Not quite ELI5 but I'll try "basic understanding of calculus" level.

The GPT model learns complex relationships between words (or tokens to be more specific, explained below) as probability scores ranging from 0 to 1. In very broad terms, you could think of these as the likelihood of one word appearing next to another in the massive amounts of text the model was trained with: the words "apple" and "pie" are often found together, so they might have a high-ish probability of 0.7, while the words "apple" and "chair" might have a lower score of just 0.2. Recent GPT models consist of several billions of these scores, known as the weights. Once their values have been estabilished by feeding lots of text through the model's training process, they are all that's needed to generate more text.

When feeding some input text into a GPT model, it is first chopped up into tokens that are each given a number: for example, the OpenAI tokenizer translates "Hello world!" into the numbers [15496, 995, 0]. You can think of it as the A=1, B=2, C=3... cipher we all learnt as kids, but with numbers also assigned to common words, syllables and punctuation. These numbers are then inserted into a massive system of multivariable equations where they are multiplied together with the billions of weights of the model in a specific manner. This results in probability scores for each token known by the model, and one of the tokens with the highest scores is chosen as the model's output semi-randomly. This cycle is then repeated over and over to generate text, one token at a time.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

How closely related words and their attributes are to other words.

[–] [email protected] 10 points 1 year ago

Edit: i see now it's an article and not just you asking a question lol. I'll leave it up anyway.