this post was submitted on 23 May 2024
1867 points (98.6% liked)

Technology

60079 readers
3484 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

Archive link: https://archive.ph/GtA4Q

The complete destruction of Google Search via forced AI adoption and the carnage it is wreaking on the internet is deeply depressing, but there are bright spots. For example, as the prophecy foretold, we are learning exactly what Google is paying Reddit $60 million annually for. And that is to confidently serve its customers ideas like, to make cheese stick on a pizza, “you can also add about 1/8 cup of non-toxic glue” to pizza sauce, which comes directly from the mind of a Reddit user who calls themselves “Fucksmith” and posted about putting glue on pizza 11 years ago.

A joke that people made when Google and Reddit announced their data sharing agreement was that Google’s AI would become dumber and/or “poisoned” by scraping various Reddit shitposts and would eventually regurgitate them to the internet. (This is the same joke people made about AI scraping Tumblr). Giving people the verbatim wisdom of Fucksmith as a legitimate answer to a basic cooking question shows that Google’s AI is actually being poisoned by random shit people say on the internet.

Because Google is one of the largest companies on Earth and operates with near impunity and because its stock continues to skyrocket behind the exciting news that AI will continue to be shoved into every aspect of all of its products until morale improves, it is looking like the user experience for the foreseeable future will be one where searches are random mishmashes of Reddit shitposts, actual information, and hallucinations. Sundar Pichai will continue to use his own product and say “this is good.”

you are viewing a single comment's thread
view the rest of the comments
[–] MargotRobbie 68 points 7 months ago (3 children)

Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.

There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let's do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let's say, the internet marketing campaign of a major Hollywood blockbuster.

Since scrapers do not understand context, by creating shitposts in similar format to, let's say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it's likely that shitposts in the same format would be generated instead, with no one being the wiser.

That would be pretty funny.

Again, this is entirely hypothetical, of course.

[–] [email protected] 29 points 7 months ago (1 children)

What’s this about shitposting? I’m just here to talk about rampart.

[–] MargotRobbie 7 points 7 months ago* (last edited 7 months ago)

I knew it! So that's what you've really been up to on Lemmy, @[email protected]

Or should I say, Academy Award nominated actor Woody Harrelson?

[–] [email protected] 15 points 7 months ago (1 children)
[–] WindyRebel 5 points 7 months ago

As an SEO - I don’t want this AI crap at all in search. Leave it on its own siloed platform, please!

[–] CheeseNoodle 6 points 7 months ago (1 children)

So we should all start ending our comments with a randomly generated string of words to fuck with the models?

stork, fridge, tiger, animal, mineral, oxtail, oil, clouds

[–] Clasm 7 points 7 months ago

Ideally, it would be the same word over and over, so that we can trick the AI into ending all sentences with the word. Bonus points if it is the word "buffalo", since it can from a grammatically correct sentence.

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo