this post was submitted on 28 Aug 2024
204 points (99.0% liked)

Programming

17670 readers
256 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 2 years ago
MODERATORS
 

A judge has dismissed the majority of claims in a copyright lawsuit filed by developers against GitHub, Microsoft, and OpenAI.

The lawsuit was initiated by a group of developers in 2022 and originally made 22 claims against the companies, alleging copyright violations related to the AI-powered GitHub Copilot coding assistant.

Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.

...

Despite this significant ruling, the legal battle is not over. The remaining claims regarding breach of contract and open-source license violations are likely to continue through litigation.

all 49 comments
sorted by: hot top controversial new old
[–] [email protected] 101 points 3 months ago (3 children)

The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”

"Rarely" is not zero. This looks like it's opening a loophole to copying open source code with strong copyleft licenses like the GPL:

  1. Find OSS code you want to copy
  2. Set up conditions for Copilot to reproduce code
  3. Copy code into your commercial product
  4. When sued, just claim Copilot generated the code

Depending on how good your lawyers are, 2 is optional. And bingo! All the OSS code you want without those pesky restrictive licenses.

In fact, I wonder if there's a way to automate step 2. Some way to analyze an OSS GitHub repo to generate inputs for Copilot that will then regurgitate that same repo.

[–] [email protected] 24 points 3 months ago

With an automated refactoring step to pretend it's really not derivative work despite being extremely derivative

[–] General_Effort 6 points 3 months ago* (last edited 3 months ago)

It doesn't work like that. A copy is a copy. Only if you can make it credible that you independently produced the same code, can you get through with that. Hence, clean room implementations. It's not strictly necessary but deters lawsuits.


Apparently there's some confusion here what the judge ruled. This particular part is about claims under the DMCA, not copyright infringement. The relevant sections can be seen here: https://www.copyright.gov/title17/92chap12.html [edit: link fixed. The claim was that "copyright management information" was removed; prohibited under these sections.]

Here's the original text for those who want to know more (link via The Verge): https://www.documentcloud.org/documents/24796955-github-copilot-claims-dismissed

[–] [email protected] 75 points 3 months ago (5 children)

This is an aspect of the German court system that is LEAGUES more sensible than the US - they have certified subject matter experts in a ton of domains that work with courts to help meaningfully inform judicial decisions. The system isn’t perfect (no system is), but it’s a damn sight better than what the US generally does. I'm categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.

[–] [email protected] 26 points 3 months ago* (last edited 3 months ago) (2 children)

I am usually not wont to defend the dysfunction presently found in the USA federal (and state-level) judiciary, but I think this comparison to the German courts requires a bit more context. Generally speaking, the USA federal courts and US States adopt the adversarial system, originally following the English practice in both common law and equity. This means the judge takes on a referee role, and a plaintiff and a defendant will make their best, most convincing arguments.

I should clarify that "common law" in this context refers to the criminal matters (akin to public law), and "equity" refers to person-versus-person disputes (akin to private law), such as contracts.

For the adversarial system to work, the plaintiff and defendant need to be sufficiently motivated (and nowadays, well-monied) to put on good arguments, or else they're just wasting the court's time. Hence, there is a requirement (known as "standing") where -- grossly oversimplifying -- the plaintiff must be the person with the most to gain, and the defendant must be the person with the most to lose. They are interested parties who will argue vigorously.

Of course, that's legal fiction, because oftentimes, a defendant might be unable to able to afford excellent legal counsel. Or plaintiffs will half-ass or drag out a lawsuit, so that it's more an annoyance to the opposite party.

In an adversarial system, it is each party's responsibility to obtain subject-matter experts and their opinions to present to the court. The judge is just there to listen and evaluate the evidence -- exception: criminal trials leave the evaluation of evidence to the jury.

Why is the USA like this? For the USA federal courts, it's because it's part of our constitution, in the Case or Controversy Clause. One of the key driving forces for drafters of the USA Constitution was to restrict the powers of government officials and bureaucrats, after seeing the abuses committed during the Colonial Era. The Clause above is meant to constrain the unelected judiciary -- which otherwise has awe-inducing powers such as jailing people, undoing legislation, and assigning wardship or custody of children -- from doing anything unless some controversy actually needed addressing.

With all that history in mind, if the judiciary kept their own in-house subject-matter experts, then that could be viewed as more unelected officials trying to tip the scale in matters of science, medicine, computer science, or any other field. Suddenly, landing a position as the judiciary's go-to expert could have broad reaching impacts, despite no one in the federal judiciary being elected.

In a sense, because of the fear of officials potentially running amok, the USA essentially "privatizes" subject matter experts, to be paid by the plaintiff or defendant, rather than employed by the judiciary. The adversarial system is thus an intentional value judgement, rather than "whoopsie" type of thing that we walked into.

Small note: the federal executive (the US President and all the agencies) do keep subject matter experts, for the limited purpose of implementing regulations (aka secondary legislation). But at least they all report indirectly to the US President, who is term-limited and only stays 4 years at a time.

This system isn't perfect, but it's also not totally insane.

[–] [email protected] 15 points 3 months ago (1 children)

I mean I get what you’re saying on a theoretical level, but all of that breaks down once you fill the judiciary with rank incompetents and political hacks.

[–] [email protected] 13 points 3 months ago (2 children)

You are absolutely correct: this fragile experiment called democracy will not survive if the citizenry becomes ambivalent about its institutions, allowing corrupt officials and other enablers of authoritarianism to take root.

If you are an American and that prospect disturbs you, then you need to help strengthen and guard the institutions that protect the core American values. Nobody owes you a democracy.

For some ideas of what to do, this post by Teri Kanefield has a list of concrete actions that you can take: https://terikanefield.com/things-to-do/

[–] [email protected] 3 points 3 months ago

I saw the Chad no-self-upvote move, so here’s mine 🍻

[–] [email protected] 2 points 3 months ago

For some ideas of what to do, this post by Teri Kanefield has a list of concrete actions that you can take: https://terikanefield.com/things-to-do/

Very much appreciated.

[–] General_Effort 2 points 3 months ago (1 children)

You should emphasize more that the difference adversarial system vs inquisitorial system exists in criminal law only. In civil/private matters - eg copyright disputes like in this instance - continental Europe handles matters much the same.

[–] [email protected] 3 points 3 months ago* (last edited 3 months ago) (1 children)

I will admit that my familiarity with private law outside the USA is almost non-existent, except for what I skimmed from the Wikipedia article for the Inquisitorial system. So I had assumed that private law in European jurisdictions would follow the same judge-intensive approach. Rereading the article more closely, I do see that it really only talks about criminal proceedings.

But I did some more web searching, and found this -- honestly, extremely convenient -- article comparing civil litigation procedure in Germany and California (the jurisdiction I'm most familiar with; IANAL). The three most substantial differences I could identify were the judge's involvement in: serving papers, discovery, and depositions.

Serving legal notice is the least consequential difference between California and Germany, but it seems that the former allows any qualified adult to chase down the respondent (ie person being sued) and deliver the notice of a lawsuit -- hence the trope of yelling "you have been served" and then throwing a stack of papers at someone's porch -- on behalf of the complainant (person who filed the lawsuit). Whereas German courts take up the role themselves for notifying the complainant. Small difference, but notable.

In Germany, the court, and not the plaintiff, is required to serve the complaint on the defendant without undue delay, which is usually immediately after it has been filed with the court.

Next, discovery and pleadings in Germany appear to be different from the California custom. It seems that German courts require parties to thoroughly plead their positions first, and only afterwards will discovery begin, with the court deciding what topics can be investigated. Whereas California allows parties to make broad assertions that can later be proven or disproven during discovery. This is akin to throwing spaghetti at the wall and seeing what sticks, and a big reason this is done is because any argument that isn't raised during trial cannot be reargued during a later appeal.

I believe that discovery in California and other US States can get rather invasive, as each party's lawyers are on a fact-finding mission where the truth will out. The general limitation on the pleadings in California is that they still must be germane to the complaint and at least be colorable. This obviously leads to a lot of pre-trial motions, as the targeted party will naturally want to resist a fishing expedition during discovery.

Lastly, depositions in Germany involve the judge(s) a lot more than they would in California. Here, depositions are off-site from the court and conducted by the deposing party, usually video-taped and with all attorneys present, plus a privately hired stenographer, with the deposing attorney asking questions. Basically, after a deposition order is granted by the judge, the judge isn't involved unless during the deposition, the process is interrupted in a way that would violate the judge's order. But the solution to that is to simply phone the judge and ask for clarification or a new order to force the deposition to continue.

Whereas that article describes the German deposition process as always occuring in court, during trial, and with questions asked by the judge(s). The parties may suggest certain questions by way of constructing arguments which require the judge(s) to probe in a particular direction. But it's not clear that the lawyers get to dictate the exact questions asked.

In contrast, depositions in Germany are conducted by the judge or the panel of judges and only during trial.

I grant you that this is just an examination of the German court proceedings for private law. And perhaps Germany may be an outlier, with other European counterparts adopting civil law but with a more adversarial flavor for private law. But I would say that for Germany, these differences indicate that their private law is more inquisitorial overall, in stark contrast to the California or USA adversarial procedure for private litigation.

[–] General_Effort 3 points 3 months ago

Wow, long take. I didn't want "much the same" to bear a lot of meaning. In the german inquisitorial system, in a criminal case, the judge takes over the (police) investigation from the prosecution. When the police become aware of a possible crime, they inform the bureau of the state attorney. A state attorney is responsible for the investigation and for uncovering the truth. But once the case goes to court, the responsibility goes to the judge.

In a civil suit, the parties are basically in charge and not the judge. It's true that the judge has a more active role in German civil procedure. While the court is not supposed to run its own investigation, it can request additional evidence if it's necessary to judge the arguments of either side. I am not clear on the details. Where matters of fact must be determined by an expert, either party can request the court to provide one. But they can also make their own arrangements. The court can also solicit an expert opinion on its own, if necessary. Typically, the expert's opinion is given as a written statement. An oral disposition may happen when questions remain. Afaik, it's unusual to depose an expert without having first requested a written statement. Either party or the court may question the witness.

[–] [email protected] 10 points 3 months ago (1 children)

The internet is just a series of pipes!

[–] [email protected] 16 points 3 months ago* (last edited 3 months ago) (1 children)

Tubes. It’s TUBES, you philistine!

And you can’t just dump something on it; it’s not a big truck.

[–] [email protected] 7 points 3 months ago

Look I'm not a tube expert jeesh.

No one wants to buy my pipes

[–] [email protected] 6 points 3 months ago (2 children)

Judge William Alsup. Um, now ask me to name another.

Biden or Harris could do the US a favor and name, say, Shayon Ghosh to the federal bench. He's not quite as qualified as Alsup: whilst he's also from Jackson, MS, he strangely chose to go to Carnegie Mellon over Alsup's choice of Mississippi State.

[–] [email protected] 4 points 3 months ago (1 children)

I mean sure you can cherry pick examples that are outstanding justices in that regard. But that’s never going to hold a candle to implementing a systemic norm that essentially says “a judge ruling on a case primarily concerned with can tap a pool of certified experts on to make the most informed decision possible”. An enhancement to that would be “the pool of experts may also flag decisions made by justices that the a majority of said experts deem inappropriate”.

I’m not saying this hypothetical system would be perfect, or that it wouldn’t need further tweaking and iteration, but specifically including feedback mechanisms like that would probably (hopefully) steer things towards a reasonably decent trajectory.

[–] [email protected] 3 points 3 months ago (1 children)

...

I think you misread the tone of my comment. I can name one. And point out one more potential candidate. I'd say that supports your position.

Also, I'm not sure how that constitutes cherry-picking, as for me that particular word choice implies a lack of good-faith reasoning. Regardless, I greatly appreciate your tone and consideration as well as your thoughtful points. Good discussion!

[–] [email protected] 2 points 3 months ago* (last edited 3 months ago)

Fair point. Didn’t mean to come off stabby, or to imply bad faith. I appreciate the discussion as well! Cheers, friend! 🍻

[–] General_Effort 1 points 3 months ago (1 children)

Judge William Alsup.

Now I remember that guy. He decided oracle vs google. I can't imagine he has many fans here.

[–] [email protected] 1 points 3 months ago (1 children)

I'd imagine the opposite. I'd be astonished if many programmers who use Lemmy would disagree with Alsup's ruling that "So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical."

[–] General_Effort 2 points 3 months ago (1 children)

Yes, I know what you mean. But looking at the comments here, Fair Use is not a popular concept. I remember that Alsup specifically quoted the copyright clause in his ruling. I can't imagine any argument that would make him rule, on the whole, for the plaintiffs in a case such as this.

[–] [email protected] 1 points 3 months ago (1 children)

Huh. Thanks for explaining. I certainly find that surprising, but I definitely don't have enough experience with this community to know the shape of its members' feelings on copyright or fair use.

Thanks.

[–] General_Effort 2 points 3 months ago (1 children)

Don't listen to me on that. I have no idea how the community feels on copyright or fair use. Whenever AI comes up, the most dogmatic copyright maximalism dominates. On other subjects, the debate is more nuanced. I don't know how that fits together at all. But I guarantee you, if Alsup ruled on a case like this/OP, they would... Well, most comments would not like the ruling or him.

[–] [email protected] 1 points 3 months ago (1 children)

Really good point about the AI context. I really hadn't considered how it would leak over into potentially corroding support for fair use.

[–] General_Effort 1 points 3 months ago

Come to think of it. That DMCA argument would really wreck fair use.

It's illegal to remove "copyright management information" (CMI). In this case meaning the FOSS license. The argument was, that when copilot spits out verbatim snippets of source code without the license, this constitutes removal of the CMI. The point of the argument was that fair use is not a defense under the DMCA. These verbatim snippets are pretty obvious fair use to me, so countering that defense is important if they hope to get anywhere with their suit.

By the same argument, any meme image is illegal. They are taken from somewhere without the original license or attribution. Yikes.

[–] General_Effort 4 points 3 months ago (1 children)

I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.

Can you name one in Germany? Just asking.


Anyway, at this stage of the trial only legal experts are involved. The judge examines if the legal arguments are sound, assuming the allegations are true. Whether the allegations are actually true will only be determined in the future. That's also when Fair Use comes in. At that point, you need outside experts to advise on the non-legal aspects.

[–] [email protected] 2 points 3 months ago (1 children)

Not a specific one, but I was kind of citing the German judicial system writ large as a model that appeared meaningfully more effective than the model the US uses.

[–] General_Effort 2 points 3 months ago* (last edited 3 months ago)

Hmm. In what way is the German system more effective? I know of some hair-raising cases. Me, I blame the law-makers and not the judges, but others see it differently. I can't think of a single related case, where I'd say that the judgement served everyone's interests.

ETA: Bad question. You explained how the German system is more effective. I'm wondering about cases where I can see this in action. IE: "well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields."

[–] [email protected] 1 points 3 months ago

Consistently? Not that I can think of either but there was that one judge in the Oracle v Google Java case that I believe learned enough programming to call BS on oracle's claims.

[–] [email protected] 37 points 3 months ago (2 children)

Sounds like it's time to start training code-writing models on leaked Microsoft source code. Don't worry, it's not like it'll "emit memorized code".

[–] grue 18 points 3 months ago

The only trouble is that, at this point, Microsoft leaked code is so inferior nobody wants it anyway.

[–] devfuuu 6 points 3 months ago

Ahh all those sweet source available windows OS code that can be used by universities for studying and wtv fed through a pipe like this. Would be fun seeing them defending it then.

[–] [email protected] 20 points 3 months ago

Very curious what the final output of this will be... if they can finally just train their models on everything with no repercussions, I wonder what kind of loopholes that will open for say music. "I didn't share your music, I shared a model that happens to output music trained from your input. Yes, it happens to be byte for byte".

Anti Commercial-AI license

[–] [email protected] 19 points 3 months ago (1 children)

Well. Aren't those two exactly what open source licensing is about?

Either you follow the license, or you are in violation of copyright.

[–] Crackhappy 2 points 3 months ago (1 children)

Hmmm is it copyright or breach of contract? It's a valid point.

[–] [email protected] 13 points 3 months ago* (last edited 3 months ago) (1 children)

This article is over a month old, with a variation of it published by the Verge on July 9th.

[–] [email protected] 5 points 3 months ago (1 children)

Oh. I'm sorry if this was discussed previously... I only returned to lemmy a few weeks ago and didn't see the story covered yet.

[–] [email protected] 9 points 3 months ago

I think this community accepts posts from weeks, months, and even years older. I think it also accepts repeat articles. It's just the format of the article makes it seem like it was published today, not months ago. It's an ongoing legal case, and has progressed further from the the Verge's reporting of the order from 06/24/2024.

[–] paf0 6 points 3 months ago

Eventually someone is going to train an AI on Microsoft's business practices and beat them at their own game.