this post was submitted on 05 Jul 2023
149 points (96.3% liked)

Technology

59288 readers
4361 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] fubo 10 points 1 year ago* (last edited 1 year ago) (1 children)

With how prevalent the AI and data scraping conversation has become

You realize that "conversation" is fake, right? There is no increased load on Twitter, Reddit, or other web services due to "AI data scraping". That was made up to distract from the material causes of Twitter's failure, namely:

  1. most of their engineers were laid-off or quit
  2. they don't pay their bills

Big tech companies that already run search engines already have a copy of all public Web pages, which they use for search engine indexing. They don't need to make a second copy for AI training; they can just use the same one.

Google can train Bard with the same copy of the public Web that they use to create Google Search; same with Microsoft, Baidu, or any other big company that runs a search engine.

And for everyone else, there's Common Crawl.

[–] [email protected] 2 points 1 year ago

“Fake” from the side of data load, sure, I can see that, but there’s plenty of interest in trying to stave off the “dead internet” by incorporating new systems where bots and AI generated content aren’t profitable. That’s more what I was referring to.