this post was submitted on 08 Jan 2024
134 points (89.0% liked)

Technology

58062 readers
3610 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Generative AI Has a Visual Plagiarism Problem::Experiments with Midjourney and DALL-E 3 show a copyright minefield

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 21 points 8 months ago* (last edited 8 months ago) (1 children)

The new version of midjourney has a real overfitting problem. The way it was done if I remember correctly is that someone found out v6 was trained partially with Stockbase images pairs, so they went to Stockbase and found some images and used those exact tags in the prompts. The output from that greatly resembled the training data, and that's what ignited this whole thing.

Edit: I found the image I saw a few days ago. They need to go back and retrain their model, IMO. When the output is this close to the training, it has to be hurting the creativity of the model. This should only happen with images that haven't been de-duped in the training set, so I don't know what's going on here.

[–] Blue_Morpho 1 points 8 months ago (1 children)

In 15 minutes I can get Google to give me a link to pirated content. Hosting links to pirated content gets you arrested in the US. But Google doesn't just give you the pirate links which is why it is legal. It's a tool that you can use to get them if you work at it a little.

[–] [email protected] 2 points 8 months ago

I'm not arguing on the side of the detractors, I just think the model could produce better output than this.