Technology

61978 readers

4054 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

435

Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data (www.404media.co)

submitted 1 year ago by [email protected] to c/technology

97 comments fedilink hide all child comments

ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

you are viewing a single comment's thread
view the rest of the comments

[–] brianorca 9 points 1 year ago (1 children)

Diffusion AI (most image AI) works differently than an LLM. They actually start with noise, and adjust it iteratively to satisfy the prompt. So they don't tend to reproduce entire images unless they are overtrained (i.e. the same image was trained a thousand times instead of once) or the prompt is overly specific. (i.e you ask for "The Mona Lisa by Leonardo")

But words don't work well with diffusion, since dog and God are very different meanings despite using the same letters. So an LLM spits out a specific sequence of word tokens.

[–] [email protected] 2 points 1 year ago

You could use diffusion to generate text. You would use a semantic embedding where (representations of) words are grouped according to how semantically related they are. Rather than dog/God, you would more likely switch dog for canine. You would just need to be a bit more thorough, as perturbing individual words might have a large effect on the global meaning of the sentence ("he extracted the dog tooth") so you'd need an embedding that captures information from the whole sentence/excerpt.