I have been following the strange intersection of AI and copyright and find that the intersection strains the current intellectual property regime nearer to the breaking point. Recently an article in IEEE Spectrum illustrated a collision of two problems, algorithmically combined to create a third, much more dire problem.
The catastrophic problem is that AI can generate images or text that are so similar to the original copyrighted works that the generated work would be considered a violation of the copyright of the original work. The AI makes changes to the works in insignificant ways, but the output is so close to the original in all the ways that matter such that any court of law would find the material infringing.
The authors of the article discusses how there is no good technical solution to the problem, but doesn’t dwell too long on why. The most obvious solution – generate works that are so many points of similarity away from the original work that it isn’t infringing doesn’t work. The problem is, what is significant to copyright is not clear algorithmically.
Take the above image as an example. When I came across it, I would have sworn it was a promotion taken from some popular Japanese animation. However, it shows the color and appropriate adornments to depict the Hindu god Krishna, which is rather rare in Japanese media (although there is no shortage of gods from around the world in that media). A search on the image only returns different depictions of Krishna, making the blue skin tone more sticky than any other feature of the image, including the face itself. What is important is highly reliant on why the material is important to the viewer and external factors not inherent in the image.
(As a side note, this type of infringement would not likely be pursued in Japan, which maintains a significantly different copyright regime than the U.S.)
Interestingly enough, AI seems like the best solution to addressing the problem. If engines had a post-filter AI that was trained to compare the training data with the output to find meaningful similarities, then potentially infringing output could be discarded. This would be an interesting and difficult process, but I don’t think it would be impossible.
All is not lost for AI, but there is a lot that needs to be done. AI offerings must both adequately licensing the entire corpus of training data so that creators see benefit from their works, as well as train an additional dimension of AI to limit the outputs to prevent something that is truly infringing or otherwise find a solution that works.
Copyrighted Material for Training
As pointed out in the article, it is obvious both MidJourney and OpenAI used copyrighted material for training their AI which was not licensed. The argument against licensing that material is that the engine, like a human, was merely experiencing the material rather than copying it in a substantive manner. The large volume of material that the AI “consumes” would collectively put a rather large price tag on such an endeavor, but to me that just sounds like a negotiating point.
I think we can put to rest the idea that “merely experiencing” is something AI engines do. Traditional copyright licensing does not contemplate AI as any type of use. Arguably, any use that is not contemplated in a license agreement is not licensed. The cited New York Times case will give us better insight once it moves through the courts.
For example, recorded video licensing does contemplate the audience when the material is licensed. Videotape recordings are licensed differently for private or public viewing. Although the public vs. private dimension is somewhat orthogonal, it seems entirely reasonable to make copyrighted material available based on how it will be used: for an individual natural person to use or for machine training purposes.
What rights on the web?
The argument is often made that if something is put on the web, it is made available for all to observe and AI training is merely observation. Without getting into the technical weeds of reproducing images on a remote computer, it is universally accepted that no copyright-based rights are forfeited by such posting. Arguments that by posting something on your own website you abandon any rights is entirely unsupported.
Copyrighting Generated Material
Now, whether material that is generated is copyrightable is an entirely different question. Obviously hand-made copies (such as an artist learning their trade would make) are not copyrightable and no one would argue they are. But photographs and recordings of works are entirely subject to copyright.
To date, the Copyright Office of the Library of Congress is unwilling to grant copyright registrations for words or images that were produced by AI. An arrangement of multiple works together, for example images to illustrate a literary work, has been considered to be copyrightable.
Prompt engineering, or the art of coming up with great language to feed into an engine to produce useful output has been seen by many as a creative act. Many argue that this is comparable to the scene setting and selection that is seen as the creative acts that are required for photography. There are several pending attempts to copyright materials where this is being considered, but a work must be submitted to the copyright office three times before it is even eligible to be heard in court, so there it is a drawn out process that we are still waiting on seeing through.