Technology

61779 readers

5953 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

135

Generative AI Has a Visual Plagiarism Problem (spectrum.ieee.org)

submitted 1 year ago by L4s to c/technology

48 comments fedilink hide all child comments

Generative AI Has a Visual Plagiarism Problem::Experiments with Midjourney and DALL-E 3 show a copyright minefield

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 21 points 1 year ago* (last edited 1 year ago) (1 children)

The new version of midjourney has a real overfitting problem. The way it was done if I remember correctly is that someone found out v6 was trained partially with Stockbase images pairs, so they went to Stockbase and found some images and used those exact tags in the prompts. The output from that greatly resembled the training data, and that's what ignited this whole thing.

Edit: I found the image I saw a few days ago. They need to go back and retrain their model, IMO. When the output is this close to the training, it has to be hurting the creativity of the model. This should only happen with images that haven't been de-duped in the training set, so I don't know what's going on here.

[–] Blue_Morpho 1 points 1 year ago (1 children)

In 15 minutes I can get Google to give me a link to pirated content. Hosting links to pirated content gets you arrested in the US. But Google doesn't just give you the pirate links which is why it is legal. It's a tool that you can use to get them if you work at it a little.

[–] [email protected] 2 points 1 year ago

I'm not arguing on the side of the detractors, I just think the model could produce better output than this.