this post was submitted on 26 Jul 2023

868 points (96.5% liked)

Technology

64107 readers

8218 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

868

Thousands of authors demand payment from AI companies for use of copyrighted works (www.cnn.com)

submitted 2 years ago by L4s to c/technology

347 comments fedilink hide all child comments

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 47 points 2 years ago (7 children)

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?

[–] [email protected] 73 points 2 years ago (3 children)

Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.

[–] dojan 30 points 2 years ago (1 children)

Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.

[–] [email protected] 21 points 2 years ago (1 children)

Even more so, if you consider that the LLMs are marketed to replace the authors.

[–] dojan 5 points 2 years ago

Yeah which I still feel is utterly ridiculous. I love the idea of AI tools to assist with things, but as a complete replacement? No thank you.

I enjoy using things like SynthesizerV and VOCALOID because my own voice is pretty meh and my singing skills aren't there. It's fun to explore the voices, and learn how to use the tools. That doesn't mean I'd like to see all singers replaced with synthesized versions. I view SynthV and the like as instruments, not much more.

I've used LLVMs to proofread stuff, and help me rephrase letters and such, but I'd never hire an editor to do such small tasks for me anyway. The result has always required editing anyway, because the LLVMs have a tendency to make stuff up.

Cases like that I don't see a huge problem with. At my workplace though they're talking about generating entire application layouts and codebases with AI and, being in charge of the AI evaluation project, the tech just isn't there yet. You can in a sense use AI to make entire projects, but it'll generate gnarly unmaintainable rubbish. You need a human hand in there to guide it.

Otherwise you end up with garbage websites with endlessly generated AI content, that can easily be manipulated by third party actors.

[–] ProfessorZhu 9 points 2 years ago (1 children)

Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.

[–] [email protected] 10 points 2 years ago (2 children)

This is what I got. Looks pretty 1:1 for me.

[–] jackie_jormp_jomp 11 points 2 years ago (1 children)

Hilarious that it started with just "Buddy", like you'd be happy with only the first word.

[–] [email protected] 6 points 2 years ago* (last edited 2 years ago)

Yeah, for some reason it does that a lot when I ask it for copyrighted stuff.

As if it knew it wasn't supposed to output that.

[–] Cheems 5 points 2 years ago (1 children)

To be fair you'd get the same result easier by just googling "we will rock you lyrics"

How is chatgpt knowing the lyrics to that song different from a website that just tells you the lyrics of the song?

[–] [email protected] 5 points 2 years ago

Two points:

Google spitting out the lyrics isn't ok from a copyright standpoint either. The reason why songwriters/singers/music companies don't sue people who publish lyrics (even though they totally could) is because no damages. They sell music, so the lyrics being published for free doesn't hurt their music business and it also doesn't hurt their songwriting business. Other types of copyright infringement that musicians/music companies care about are heavily policed, also on Google.
Content generation AI has a different use case, and it could totally hurt both of these businesses. My test from above that got it to spit out the lyrics verbatim shows, that the AI did indeed use copyrighted works for it's training. Now I can ask GPT to generate lyrics in the style of Queen, and it will basically perform the song texter's job. This can easily be done on a commercial scale, replacing the very human that has written these song texts. Now take this a step further and take a voice-generating AI (of which there are many), which was similarly trained on copyrighted audio samples of Freddie Mercury. Then add to the mix a music-generating AI, also fed with works of Queen, and now you have a machine capable of generating fake Queen songs based directly on Queen's works. You can do the very same with other types of media as well.

And this is where the real conflict comes from.

[–] chakan2 6 points 2 years ago (2 children)

you could assume that it must have had access to the original.

I don't know if that's true. If Google grabs that book from a pirate site. Then publishes the work as search results. ChatGPT grabs the work from Google results and cobbles it back together as the original.

Who's at fault?

I don't think it's a straight forward ChatGPT can reproduce the work therefore it stole it.

[–] [email protected] 22 points 2 years ago

Both are at fault: Google for distributing pirated material and OpenAI for using said material for financial gain.

[–] [email protected] 9 points 2 years ago (1 children)

Copyright doesn't work like that. Say I sell you the rights to Thriller by Michael Jackson. You might not know that I don't have the rights. But even if you bought the rights from me, whoever actually has the rights is totally in their legal right to sue you, because you never actually purchased any rights.

So if ChatGPT ripps it off Google who ripped it off a pirate site, then everyone in that chain who reproduced copyrighted works without permission from the copyright owners is liable for the damages caused by their unpermitted reproduction.

It's literally the same as downloading something from a pirate site doesn't make it legal, just because someone ripped it before you.

[–] [email protected] 3 points 2 years ago (1 children)

That's a terrible example because under copyright law downloading a pirated thing isn't actually illegal. It's the distribution that is illegal (uploading).

[–] [email protected] 1 points 2 years ago

Yes, downloading is illegal, and the media is still an illegally obtained copy. It's just never prosecuted, because the damages are miniscule if you just download. They can only fine you for the amount of damages you caused by violating the copyright.

If you upload to 10k people, they can claim that everyone of them would have paid for it, so the damages are (if one copy is worth €30) ~€300k. That's a lot of money and totally worth the lawsuit.

On the other hand, if you just download, the damages are just the value of one copy (in this case €30). That's so miniscule, that even having a lawyer write a letter is more expensive.

But that's totally besides the point. OpenAI didn't just download, they replicate. Which is causing massive damages, especially to the original artists, which in many cases are now not hired any more, since ChatGPT replaces them.

[–] [email protected] 13 points 2 years ago

there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.

[–] foggy 10 points 2 years ago (1 children)

Not without some seriously invasive warrants! Ones that will never be granted for an intellectual property case.

Intellectual property is an outdated concept. It used to exist so wealthier outfits couldn't copy your work at scale and muscle you out of an industry you were championing.

It simply does not work the way it was intended. As technology spreads, the barrier for entry into most industries wherein intellectual property is important has been all but demolished.

i.e. 50 years ago: your song that your band performed is great. I have a recording studio and am gonna steal it muahahaha.

Today: "anyone have an audio interface I can borrow so my band can record, mix, master, and release this track?"

Intellectual property ignores the fact that, idk, Issac Newton and Gottfried Wilhelm Leibniz both independently invented calculus at the same time on opposite ends of a disconnected globe. That is to say, intellectual property doesn't exist.

Ever opened a post to make a witty comment to find someone else already made the same witty comment? Yeah. It's like that.

[–] [email protected] 14 points 2 years ago* (last edited 2 years ago) (3 children)

Spoken by someone who has never had something you've worked years on, be stolen.

[–] kklusz 2 points 2 years ago

What was “stolen” from you and how?

[–] foggy 0 points 2 years ago (1 children)

Spoken like someone who is having trouble admitting they're standing on the shoulders of Giants.

I don't expect a nuanced response from you, nor will I waste time with folks who can't be bothered to respond in any form beyond attack, nor do I expect you to watch this

Intellectual property died with the advent of the internet. It's now just a way for the wealthy to remain wealthy.

[–] [email protected] 2 points 2 years ago

Here is an alternative Piped link(s): https://piped.video/PJSTFzhs1O4

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source, check me out at GitHub.

[+] FormlessMartian -2 points 2 years ago (1 children)

[deleted]

[–] [email protected] 4 points 2 years ago (1 children)

I think you said this facetiously... but it literally is.

https://www.howtogeek.com/310158/are-other-people-allowed-to-use-my-tweets/

[+] FormlessMartian -2 points 2 years ago (1 children)

[deleted]

[–] [email protected] 2 points 2 years ago (1 children)

[+] FormlessMartian 1 points 2 years ago (1 children)

[deleted]

[–] [email protected] 3 points 2 years ago (1 children)

Okay, seems you need help reading. So let me take DIRECT quotes.

As long as you haven’t made your Twitter account private, every thought you broadcast can be seen by anyone in the world. However, any words or photos you Tweet, as long as they are original, are yours and, except in specific circumstances, can’t be used without your permission.

You Retain Copyright (But That's Not the Whole Story)

Copyright law is pretty clear: the text of your Tweets is yours. There are some Fair Use arguments, such as newsworthiness or commentary, that would allow someone to copy and paste the text contents of your Tweet and post it elsewhere, but for the most part, they can’t. The ideas in your Tweets, however, aren’t covered by copyright. Only the exact wording. As the New York Times reports, a Hollywood movie studio can take your idea and turn it into a film starring Rihanna.

”You retain your rights to any Content you submit, post or display on or through the Services. What’s yours is yours — you own your Content (and your photos and videos are part of the Content).

So what does all this mean? Well first, Twitter acknowledges your copyright: “What’s your is yours.” They then go on to outline the terms of the license you grant them to use anything you post on Twitter.

I mean That's literally paragraphs worth of content telling you that YOUR CONTENT IS COVERED BY COPYRIGHT. The whole of twitters Terms of service is you granting twitter a perpetual license to YOUR CONTENT.

But right... Nothing copyrighted about comment content... and yet they mention it over and over that it is covered? Are you okay?

There is nothing copywritten about this comment.

Yes your comment is covered under copyright. Including the ones you just made.

Anyone can copy and paste this without attributing me and there is no recourse for me. If someone copy and pastes a published book, they can sue.

Covered under copyright != how easy it would be to claim damages.

If someone copy and pastes a published book, they can sue. How can they possibly be the same thing…?

It literally is.

https://law.stackexchange.com/questions/16680/are-comments-posted-on-websites-owned-by-the-website-or-the-commenter

We can even look at this from the opposite direction. Here's a full list of works that cannot be Copyright.

https://www.dmlp.org/legal-guide/works-not-covered-copyright

Notice that "comments on the internet" isn't one of them.

[–] Faschr4023 5 points 2 years ago (2 children)

Personally speaking, I've generated some stupid images like different cities covered in baked beans and have had crude watermarks generate with them where they were decipherable enough that I could find some of the source images used to train the ai. When it comes to photo realistic image generation, if all the ai does is mildly tweak the watermark then it's not too hard to trace back.

[–] [email protected] 11 points 2 years ago

All but a very small few generative AI programs use completely destructive methods to create their models. There is no way to recover the training images outside of infantesimally small random chance.

What you are seeing is the AI recognising that images of the sort you are asking for generally include watermarks, and creating one of its own.

[–] [email protected] 4 points 2 years ago (1 children)

Do you have examples? It should only happen in case of overfitting, i.e. too many identical image for the same subject

[–] Faschr4023 1 points 2 years ago (1 children)

Here's one I generated and an image from the photographer. Prompt was Charleston SC covered in baked beans lol

[–] [email protected] 1 points 2 years ago

Out of curiosity what model did you use?

[–] over_clox 4 points 2 years ago

I'd think that given the nature of the language models and how the whole AI thing tends to work, an author can pluck a unique sentence from one of their works, ask AI to write something about that, and if AI somehow 'magically' writes out an entire paragraph or even chapter of the author's original work, well tada, AI ripped them off.

[–] Mastens 2 points 2 years ago (1 children)

I think that to protect creators they either need to be transparent about all content used to train the AI (highly unlikely) or have a disclaimer of liability, wherein if original content has been used is training of AI then the Original Content creator who have standing for legal action.

The only other alternative would be to insure that the AI specifically avoid copyright or trademarked content going back to a certain date.

[–] ProfessorZhu 2 points 2 years ago (1 children)

Why a certain date? That feels arbitrary

[–] thallamabond 1 points 2 years ago (1 children)

At a certain age some media becomes public domain

[–] ProfessorZhu 2 points 2 years ago

Then it is no longer copywrited

[–] [email protected] 1 points 2 years ago

They can't. All they could prove is that their work is part of a dataset that still exists.