this post was submitted on 22 Aug 2023

787 points (95.7% liked)

Technology

61332 readers

3640 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

787

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series (www.businessinsider.com)

submitted 1 year ago by L4s to c/technology

368 comments fedilink hide all child comments

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 20 points 1 year ago (4 children)

People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

[–] abbotsbury 9 points 1 year ago

but on mega steroids with a nearly limitless capacity for information retention.

That sounds like redistributing copyrighted books

[–] hup 8 points 1 year ago* (last edited 1 year ago) (2 children)

Nope people are just acting like ChatGPT is making commercial use of the content. Knowing a quote from a book isn't copyright infringement. Selling that quote is. Also it doesn't need to be content stored 1:1 somewhere to be infringement. That misses the point. If you're making money of a synopsis you wrote based on imperfect memory and in your own words it's still copyright infringment until you sign a licensing agreement with JK. Even transforming what you read into a different medium like a painting or poetry cam infinge the original authors copyrights.

Now mull that over and tell us what you think about modern copyright laws.

[–] Ronath 4 points 1 year ago

Just adding, that, outside of Rowling, who I believe has a different contract than most authors due to the expanded Wizarding World and Pottermore, most authors themselves cannot quote their own novels online because that would be publishing part of the novel digitally and that's a right they've sold to their publisher. The publisher usually ignores this as it creates hype for the work, but authors are careful not to abuse it.

[–] [email protected] -4 points 1 year ago

Lol, say that to the first (obscure) Harry Potter line I tried on ChatGPT.

https://sh.itjust.works/comment/2509413

[+] [email protected] -10 points 1 year ago (2 children)

Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

If they use my Works then they need to pay thats it.

[–] coheedcollapse 29 points 1 year ago (4 children)

Still kinda blows my mind how like the most socialist people I know (fellow artists) turned super capitalist the second a tool showed like an inkling of potential to impact their bottom line.

Personally, I'm happy to have my work scraped and permutated by systems that are open to the public. My biggest enemy isn't the existence of software scraping an open internet, it's the huge companies who see it as a way to cut us out of the picture.

If we go all copyright crazy on the models for looking at stuff we've already posted openly on the internet, the only companies with access to the tools will be those who already control huge amounts of data.

I mean, for real, it's just mind-blowing seeing the entire artistic community pretty much go full-blown "Metallica with the RIAA" after decades of making the "you wouldn't download a car" joke.

[–] [email protected] 10 points 1 year ago

Fuckin preach! I feel like I'm surrounded by children that didn't live through the many other technologies that have came along and changed things. People lost their shit when photoshop became mainstream, when music started using samples, etc. AI is here to stay. These same people are probably listening to autotuned music all day while they complain on the internet about AI looking at their art.

[–] [email protected] 6 points 1 year ago* (last edited 1 year ago) (1 children)

I feel like a lot of internet people (not even just socialists) go from seeing copyright as at best a compromise that allows the arts to have value under capitalism to treating it like a holy doctrine when the subject of LLMs comes up.

Like, people who will say "piracy is always okay" will also say "ban AI, period" (and misrepresent organizations that want regulations on it's use as wanting a full ban.)

Like, growing up with an internet full of technically illegal content (or grey area at best) like fangames and YouTube Poops made me a lifelong copyright skeptic. It's outright confusing to me when people take copyright as seriously as this.

[–] [email protected] 5 points 1 year ago (1 children)

I say piracy is always okay but also am a big fan of AI. I had chat GPT write my last cover letter and got the job

[–] alienanimals 1 points 1 year ago

Based

[–] dx1 0 points 1 year ago* (last edited 1 year ago) (4 children)

Nobody would defend copyright if it wasn't already in place, it's a sick idea. They ask us to cut the field of human knowledge for private benefit. Now they want to destroy a new technology in its name. Greed knows no bounds.

[–] Hildegarde 8 points 1 year ago (2 children)

I defend the idea of copyright. The first copyright law was in 1710, to protect authors from the printing press. Without copyright, whoever owned the printing press would sell copies of books with no obligation to pay the author. When copying art is trivial, the artist needs copyright protection in order to make a living creating art.

There are major problems with modern copyrights. Like all things in capitalism it has been subverted to benefit the rich, but the core idea behind copyright is sound.

These lawsuits are not to stop the development if generative AI. These lawsuits are to stop the unlicensed use of copyrighted works as AI training data.

There are AI models that are only trained with licensed data. This doesn't stop the development of AI.

Artists should have the right to choose whether their work is used as training data. And they should be compensated fairly for it. That will be the case if these lawsuits succeed.

[–] dx1 -3 points 1 year ago* (last edited 1 year ago) (1 children)

Ultimately it's a propertarian scheme of ownership imposed onto the realm of concepts and ideas. The first person to successfully lay claim to an idea is given a monopoly on that idea for some number of years. A book, an invention, a melody. To secure profit for that individual, the entire rest of humanity is prevented access to the idea except under his terms, and the naturally free exchange of information is curtailed by statute to accomplish this, via the imposition of punishments for anyone who goes against this scheme. I do not think that's defensible. That is to say, I don't think humanity sees a net benefit from this way of doing things. Even some hypothetical 20-30% reduction in the generation of different kinds of creative works would be well offset by the benefit humanity sees from being able to access them, and the funds that would be going to the artist still could if people saw fit.

Is this being used to stop the development of generative AI? Yes, literally the imprint on an AI of having parsed the works and understood them in some symbolic capacity, they want to curtail that. And the existing models that have already done that would likely be rendered illegal, setting the entire technology back a year or two.

[–] [email protected] 4 points 1 year ago (1 children)

In an ideal world without greed, you are right in saying that copyright is not beneficial for the human race as a whole. Unfortunately we don't live in such a world. Look at what happened with insulin. The person invented it placed a ludicrously low priced patent of one dollar because he felt that it should be available cheaply to all who need and yet today in the US, insulin is a ridiculously expensive drug which many people struggle to afford. This is because while the inventor was not greedy and thought about the greater good, the pharmaceutical industry did not. They saw an opportunity to make money and are screwing people in the process

[–] dx1 2 points 1 year ago

Insulin is ridiculously expensive because of monopoly status over methods on how to produce it (on top of any other laws restricting supply). I personally contributed to a project to create an open source method to make insulin.

[–] voluble 6 points 1 year ago (1 children)

Nobody would defend copyright if it wasn’t already in place

I don't know about that. Say you take a few years to write a handful of poems, and it turns out people in your neighborhood really like them. You compile the poems into a book, and sell it for $5, and it sells well. Seeing this, your neighbor buys one, copies it, and starts selling it one neighborhood over for $2, and representing themself as the author. I would think most people in that situation would want to say, 'hey, that's not fair'. I don't think that's sick or rooted in greed, copyright can be a check on greed.

[–] dx1 2 points 1 year ago (1 children)

So thanks to copyright, we're now living in a world where artists are fairly compensated and not exploited by large corporations acting as middlemen that have seized control of their creative works and used it for their own profit?

[–] BURN 3 points 1 year ago (1 children)

More so than we would be without copyright at all

Copyright needs to be extended for individuals and cut back for corporations. People should be allowed to own rights to their ip, but corps should have much higher levels of restrictions and how some knowledge must be shared.

[–] dx1 1 points 1 year ago (1 children)

More so than we would be without copyright at all

It's hard to imagine how it could be worse than what we have now.

Copyright needs to be extended for individuals and cut back for corporations. People should be allowed to own rights to their ip, but corps should have much higher levels of restrictions and how some knowledge must be shared.

Well in effect that would scale back the copyright nightmare we have now, but the basic problem is still there. The argument is still for near-indefinite monopoly privilege over information to be given to its creator at the expense of humanity's ability to share and reproduce the work, I don't think that's justifiable.

[–] BURN 2 points 1 year ago (1 children)

And I do. People are entitled to own their ideas. That’s a pillar I’m not willing to budge on.

As long as art has value, then the ideas do too, and the artists should be compensated for it.

Removing copyright would essentially mean the stopping of sharing everything because everyone is going to be hiding their secrets as close as possible so nobody can come and steal them and make money off them. There’d be no return on investment for any kind of research, no incentive for any artist to share their work and I firmly believe we’ll be significantly worse off without it.

[–] dx1 1 points 1 year ago* (last edited 1 year ago)

That sounds exactly backwards - letting people share information freely means there'll be more sharing of information. The whole point is that we should have a model where information is freely available. At worst that entails a separation of verticals for "research" and "production". A society can fund research as much as it feels like.

Re: corporate secrets etc. - the same principle goes for legal agreements that bind employees from sharing them. How does it benefit humanity more for a corporation to be operating in secret, using secret chemicals or processes or whatever to create a good? That right off the bat sounds like a recipe for an environmental disaster, not even getting into the problems with discouraging the advancement of technology.

Anyway, this is exactly what I meant with my original comment. Of course I've heard all these defenses before. It's the same rehashed crap I've been hearing for decades to defend this broken institution. I said "nobody would be defending this if it wasn't already the status quo", precisely because that's when people feel like any other way of doing it is impossible. See: https://en.wikipedia.org/wiki/System_justification

"As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously" - Benjamin Franklin

[–] assassin_aragorn 3 points 1 year ago* (last edited 1 year ago)

So the people who generate and curate that knowledge don't deserve to be compensated? Are you going to be a full time wikipedia editor then? Or does your "greed know no bounds"?

[–] BURN 2 points 1 year ago (1 children)

I defend copyright. The original intent was to protect creators in order to foster more creativity. Most artists will have no incentive to create if their work can be reappropriated by a larger group to leverage it for monetary gain, which is directly being taken from the original creator.

I’m a photographer. I’ve removed all my pictures from the internet and plan to never post more. I don’t want my work being used to train AI. Right now we have no choice in that matter, so the only option is to no longer share our work.

[–] dx1 0 points 1 year ago (1 children)

I've released tons of stuff and it's under Creative Commons/public domain. I welcome people to share it or create derivative works.

[–] BURN 3 points 1 year ago (1 children)

Cool. That’s a fine stance to have and one that plenty of other people will have too. I’m fine with actual people doing it. I’m not fine with AI. The point is the artist should have a choice if they’d like to allow training.

The problem right now is we can’t control that. Everything is being used for AI training if you want it to be or not. If I could explicitly forbid use of it for AI training (that could be backed in court) I’d be more willing to post them again.

Lemmy users are not an accurate representation of artists imo. This site skews extremely far left, to the points of such anti-corporate nonsense that I believe the majority of people just want to hurt anyone with more money than them as much as possible.

[–] dx1 1 points 1 year ago

The problem with trying to restrict AI from scanning the art and making conclusions about it is that it's the same as trying to ban humans from creating art that's inspired by other art. It's the same process even. If the AI is actually producing one-for-one copies of their work, you might have a leg to stand on in terms of arguing the AI shouldn't be compensated for creating those specifically, but it's creating works that are just loosely influenced by seeing the original art.