• 1 Post
  • 1.55K Comments
Joined 1 year ago
cake
Cake day: February 10th, 2025

help-circle








  • It’s extremist to take the fact that you CAN get plagiaristic output and to conclude that all other output is somehow tainted.

    You personally CAN quote copyrighted music and screenplays. If you’re an artist then you also CAN produce copyright violating works. None of these facts taint any of the other things that you produce that are not copyright or plagiarized.

    In this situation, and in the current legal environment, the responsibility to not produce illegal and unlicensed code is on the human. The fact that the tool that they use has the capability to break the law does not mean that everything generated by it is tainted.

    Photoshop can be used to plagiarize and violate copyright too. It would be just as absurd to declare all images created with Photoshop are somehow suspect or unusable because of the capability of the tool to violate copyright laws.

    The fact that AI can, when specifically prompted, produce memorized segments of the training data has essentially no legal weight in any of the cases where it has been argued. It is a fact that is of interest to scientists who study how AI represent knowledge internally and not any kind of foundation for a legal argument against the use of AI.


  • Given the research that you’ve done here I’m going to assume that you’re looking for an answer and not simply taking us on a gish gallop.

    Your premise, and what appears to be the primary source of confusion, is built on the idea that this is ‘stolen’ work which, from a legal point of view, is untrue. If you want to dig into why that is, look into the precedent setting case of Authors Guild, Inc. v. Google, Inc. (2015). The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law. i.e. It is legal, not stealing.

    The case you linked from Munich shows that other country’s legal systems are interpreting AI training in the same way. Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

    That isn’t to say that memorization doesn’t happen, but it is more of a point of interest to AI scientists that are working on understanding how AI represents knowledge internally than a point that lands in a courtrooom.

    We all memorize copyrighted data as part of our learning. You, too, can quote Disney movies or Stephen King novels if prompted in the right way. This doesn’t make any work you create automatically become plagarism, it just means that you have viewed copyrighted work as part of your learning process. In the same way, artists have the capability to create works which violate the copyright of others and they consumed copyrighted works as part of their learning process. These facts don’t taint all of their work, either morally or legally… only the output that literally violates copyright laws.

    The pragmatism here is recognizing that these tools exist and that people use them. The current legal landscape is such that the output of these tools is as if they were the output of the users. If an image generator generates a copyrighted image then the rightsholder can sue the person, not the software. If a code generator generates licensed code then the tool user is responsible.

    This is much like how we don’t restrict the usage of Photoshop despite the fact that it can be used to violate copyright. We, instead, put the burden on the person who operates the tool

    That’s what is happening here. Linus isn’t using his position to promote/enforce/encourage LLM use, nor is he using his position to prevent/restrict/disallow any AI use at all. He is recognizing that this is a tool that exists in the world in 2026 and that his project needs to have procedures that acknowledge this while also ensuring that a human is the one responsible for their submissions.

    This is the definition of pragmatism (def: action or policy dictated by consideration of the immediate practical consequences rather than by theory or dogma).

    e: precedent, not president (I’m blaming the AI/autocorrect on this one)










  • CFD = Computational Fluid Dynamics.

    It is kind of what they said, you’re right. I was more pointing how how it could be that they could ‘sense the vibes’ of a CFD result to determine if it is accurate or if the model decided to do something weird. Since it’s a chaotic process and also an artificial one, the starting conditions can yield results that are impossible/not based on reality.

    If you look at enough of them you start to notice the kinds of things that go wrong. They would also have a pretty good idea about how their design should perform and if the simulation shows different they’d first want to troubleshoot the simulation before attempting to re-design whatever system they’re creating.