AI Broadcast Series #3: IP & copyright challenges for ‘AI’ solutions and the future of ‘AI’ regulations

Vinay Kumar

September 13, 2023

AI broadcast series

ChatGPT

IP in AI

Intellectual Property

AI Broadcast Series #3: IP & copyright challenges for ‘AI’ solutions and the future of ‘AI’ regulations

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Listen to the podcast of this fireside chat below:

Disclaimer - The information provided in this podcast is for informational purposes only and should not be construed as legal advice. The discussion of copyright and patentability of AI outputs is complex and subject to change, and it is important to consult with a qualified legal professional before making any decisions based on the information presented here. The statements made in this podcast are those of the speaker and do not necessarily reflect the views of any other organization or individual. The speaker does not warrant the accuracy, completeness, or timeliness of the information provided and disclaims any liability for any damages resulting from the use of the information herein.

‍

The 'AI Broadcast' by Arya.ai is a fortnightly event series where we explore the latest advancements, trends, and implications of AI. Each session focuses on a specific AI topic - Machine Learning, Generative AI, NLP, responsible AI, and much more!

In episode #3 of the broadcast series, we engage in a dialogue with Vinay Kumar, Founder and CEO of Arya.ai, regarding the intellectual property (IP) complexities within AI solutions. He offers valuable insights into the IP challenges faced by next-gen AI applications and emphasises the crucial role of human guidance and oversight in AI technology development and utilization.

An article recently mentioned that UK legislators called on the government to limit AI's unrestricted use of copyrighted content, ensuring creative protections.

The Patent law and copyright laws we have today are created by humans, for humans - AI-crafted inventions and AI-generated works of art do not fall under that purview. The existing framework of intellectual property law doesn't recognize nonhuman creators. This is a problem that we are increasingly facing, especially as generative AI models continue to be in adoption.

Can we copyright the output from the generative AI tool?

If you look at the output of generative AI tools, they typically involve minimal to no manual intervention to generate the output. For example, prompting is a minimum intervention in producing distinct and unique creations that differentiate one output from another. Is this a copyright worthy asset and can the user own the output generated by the system? This has created a lot of debate, particularly on the definition of copyright or what makes anything a copyright worthy. Given this confusion, there is a lot of noise as well. This is not limited to text or images generation but also to voice, video or a mix of all things, which becomes really challenging as we go forward. For example, there was a recent case when a track by artist "Ghostwriter" uploaded a track 'Heart on my sleeve' which sounded like it was sung by two of the world's biggest stars, Drake and The Weeknd. In fact, it was actually someone who had used an AI tool to make his voice sound like theirs. Although it was quickly removed, it gained over 9 million views, inspired thousands of derivative TikToks, and earned hundreds of thousands of plays on Spotify and YouTube just over a weekend. In this case, it was removed because it was violating the copyrights of the actual artists themselves. So, this is an example where the market has taken action because of the feedback, or comments received from the users.

Let's take another example. There was a graphic novel created by Kris Kashtanova of Zarya of the Dawn. Kris used mid-journey and generated images for the graphic novel and was issued the copyright in Sep 2022. Can it be assumed that any art generated by generative models can be copyrighted? Again, there are still conflicting thoughts over this. The US Copyright Office, in March 2023, draws the following conclusions:

- A work is not copyrightable where AI technology generates a work autonomously without human involvement.

- A work is not copyrightable where a “complex written, visual, or musical work[]” is generated by AI technology through a user prompt. This is because the traditional elements of authorship “are determined and executed by the technology—not the human user.” As the Office Guidance explains, users of generative AI “do not exercise ultimate creative control over how such systems interpret prompts and generate material.” Instead, the Copyright Office analogies prompts to instructions given to a commissioned artist, where the technology determines how those instructions are implemented.

They have offered certain guidelines, albeit at a high level, regarding what can be considered copyrightable and what cannot. In general, content solely generated by machines is typically not deemed worthy of copyright protection. Additionally, the act of prompting alone does not constitute a substantial input that would result in creating copyrighted material. This is rough guidance, but, again, each geography acts differently when it comes to taking a final call on copyright issuance - United Kingdom copyright law theoretically allows for protecting computer-generated works; European copyright law does not. In the US, there is a bit of confusion because copyright was issued in some cases and not issued in other cases. In short, there is no clear framework or guidelines for when it does and doesn’t become a copyright.

Can the generative ‘AI’ startups like OpenAI, Anthropic, and Stability AI who provide foundation models, claim ownership in the innovation or copyright?

Let us consider a scenario where the output generated by generative AI models - could be LLMs, vision models, audio or music generation has reached a stage where the output is copyrightable. Then, who owns the copyright? Does it rest with the user who interacts with these models and generates the copyright-worthy content, or does it vest with the provider of the AI infrastructure that enables the creation of such content? Dual dialogues are going on today. Typically, these foundation models are nothing but tools for the users to create an output. They themselves may not create the output. For example, without giving any prompt or query, these models are simply a database of knowledge. In such a case, if an individual initiates a conversation or interaction with the AI model and generates an output, who will own that output? In a way, there are varying viewpoints provided by legal experts, and it's still not finalized or clear. Some legal experts argue that the copyright or ownership of the output should be given to not just the foundation models but also the enablers of these foundation models.

Let’s look at this with an example. Imagine a private entity possesses valuable protein structure information related to certain medications, and they decide to license this data to a tool for building an LLM of sorts, perhaps a synthetic AI model. Let’s say this synthetic AI model, which is trained on proprietary protein structure data from the entity, produces a truly innovative medicine. So, who owns the copyright? Does the entity have control over the resulting output? The ongoing debate hinges on the argument that unless specific prompts are used or the methodology to create the new structure is known, the model itself doesn't have any intrinsic value. It is just a tool that enabled the user to create that output. Whereas the other side of the debate is without the tool being there, the invention or the output would have never existed. By this logic, even the owners of the foundation model should own the IP of the invention. Consider implementing the second scenario - it would be really cumbersome and chaotic. Anybody who has contributed to building that LLM, even as minor as a single statement posted on a website, which subsequently becomes part of the model's training data, can claim a degree of ownership over the end output, which becomes quite challenging. One perspective on this matter draws a parallel with traditional artistic tools like a pen or a brush. While the brush design may not significantly impact the painting, the brush will not be anything other than a simple toolset in creating the painting. Hence, it's crucial to acknowledge that conflicting ideas persist in this complex debate.

Now imagine we go a few steps further in the coming years, and there is an AGI or a pseudo-AGI, for example, which has come up with a revolutionary formula as important as E=mc^2. Sure enough, the builders of this model would be strongly motivated to assert ownership rights over the resultant output. Their argument might revolve around the notion that the invention wouldn't have happened with the model in the first place.

So, as mentioned earlier, nothing is clear currently. It all depends on the recommendations, solutions, and conflicts that we are going to see and how much role the jurisdiction played in creating a framework for the final output.

What are the risks of building the IP on top of these foundation models?

At least on paper, certain foundation models do not define ownership in the output. OpenAI does not address the question of copyright authorship. Instead, OpenAI treats the matter as one of ownership via contract law. In particular, OpenAI states that it “will not claim copyright over content generated by the [ChatGPT] API for you or your end users.” Further, OpenAI allows the commercial use of material output by ChatGPT. But there is always an intrinsic problem. Let's say I employ a specific foundation model to craft a groundbreaking solution, one with the potential to revolutionize its field of application. Does this still motivate the foundation model providers not to own the copyright of the output? So, there will always be challenges regarding the IP created using an LLM. Unless there is clarity on the role of all the parties involved in a well-defined legal framework, there will always be the question of ownership of the copyright. If there is no mention of it in the user agreement, does this mean that the foundation model provider would come back and ask for the copyright of the output? There is no way to know for sure. The motivations of the involved parties can swing in either direction, constituting a substantial risk.

In today's landscape, where conventional tools are employed for IP creation, ownership is typically straightforward. However, in the context of LLMs and foundation models, the issue of ownership remains inherently uncertain. This is one risk.

Another risk is the lack of control over how these foundation models have been built. For example, there are allegations on ChatGPT that they use copyrighted material as part of their model training. It is being said that ChatGPT was trained using the contents of Harry Potter books. Let’s say someone has used copyrighted data as part of the model building without proper licenses. And then, a user creates an IP out of this model, which in turn uses the copyrighted content. Instead of getting a copywriting license, the user will be sued for copyright violation. Artists are expressing concerns about AI tools that undergo training using their artwork and thus replicating their style. Getty Images sued AI company Stability AI for training on millions of its pictures without consent. Artists and illustrators expressed similar concerns against the same company. This is becoming quite challenging.

We are at a stage where data and data privacy are supercritical. It is going to be quite tough to trust an LLM completely unless they clearly disclose either their sources or their copyright and commercial terms.

Another issue is web crawling. In a way, web crawling is allowed, but owners of the website can opt out of it. But if the information is still used, it is a copyright violation. These are the biggest concerns with LLMs. If an enterprise chooses to use these LLMs for an image, music, or audio, and in any of them, there is a similarity with copyrighted information, the absence of a constructive contract between the foundation model provider and the copyrighted content provider would create a huge mess for the enterprise. The lack of validation creates challenges in trusting the model output completely.

Additionally, regulatory entities, who conduct due diligence on such outputs face several challenges as well. Take, for instance, the scenario of submitting a music track for copyright consideration. The regulatory auditors responsible for assessing these music submissions must possess the requisite expertise and resources to conduct thorough copyright examinations. However, there remains a lack of clarity regarding whether such capacities and capabilities are readily available for this purpose.

So, anything that is built on these LLMs may be questionable, and there could be chances of copyright violation unless there is clear transparency from the point of data collection. Ideally, the accountability of the model provider should go back to the data collection process as well. Unless there is end-to-end clarity, the copyright violation risk will always be there.

From a copyright contribution or the claim of ownership of the LLM perspective, time will probably solve that problem, at least in terms of getting clarity.

Given that there is a growing influence of generative AI, is this the beginning of the end of ‘Original’ art?

Probably not. This could actually be the beginning of a new original art. Today, with a plethora of tools available, everybody can become an artist. An engineer who never held a paintbrush or a school-going kid can also become a great artist by producing good prompts and then getting good high-quality images or videos or music videos.

If that's the case, how do you come out of the clutter? You can only come out of the clutter by being creative enough. This is definitely an era of noise, and this is not going to stop anytime soon. Now the question is, does that creativity have to be 100% human-driven? I would say not. They may use these unique combinations of tools in a manner that only they can find out. It will be quite tough to become an original art, but then again, humanity always appreciates originality. Even though there could be a lot of resembling or copied artworks in the market, there could always be a big play for original artists. This is applicable in coding as well - If the LLM itself is a great coder, then how can one copyright a code? The challenge intensifies as your efforts to achieve originality increase over time. Creating any original art, a painting, music, or even coding - your process, your combinations; n combinations will give n number of ways to do the same job, that itself is originality. This will be, though, as we go forward, creating original art is tough but not impossible.

Can ‘AI’ be a co-author of a patent?

This is one of the most important questions. If we recognize AI as a co-author, it's going to change a lot of things and also the importance of what we do versus what AI is doing. It will have a tremendous impact on all things IP and copyright, patents, etc. There have been cases where people have listed AI as a co-author. In the notable case of Thaler v. Vidal, a situation emerged from two patent applications filed in 2019 by Stephen Thaler. These applications identified an AI system he referred to as "DABUS" (short for "Device for the Autonomous Bootstrapping of Unified Sentience") as the inventor. Thaler has pursued patent listings naming DABUS as the inventor in several jurisdictions, including the United States, where he has encountered numerous obstacles. Courts and patent offices in the United Kingdom, the European Patent Office, the Federal Court of Australia, and the German Federal Patent Court have all declined to grant the patent. One notable exception occurred in South Africa, where the Companies and Intellectual Properties Commission (CIPC) granted a patent in July 2021, recognizing DABUS as the inventor.

From a patentability perspective, what is becoming very clear is that you cannot mention AI as a co-author because, again, as earlier mentioned, it's simply an ‘enabler’ to your patent. At the moment, it is not recognized as a co-author. But the question remains: When does intelligence become worthy enough to be recognized as a consciousness? Can that have individuality? Does that individuality mean an equal part in society, a society like a human, for example? If this chain of actions gets recognized, then clearly, we could call an AI a co-author. But again, at the moment, it is not recognized as an author, and the criteria to recognize AI as an author or co-author is also unclear.

Can AI ever become an ‘Author’ of a patent or a copyright?

Somebody can debate as well. For instance, being a creator of AI, I create a tool that can generate an IP. So, being the owner of the tool, can I own the IP as well? Or can I make the AI own that IP? However, without a well-defined set of criteria establishing when AI can be recognized as a co-author, this issue remains shrouded in ambiguity. There has been a significant advancement in machine intelligence, especially within the past decade. During the early part of this decade, spanning from 2012 to 2018, it focused on narrow problems. And then, we have seen a transition from narrow focused to early generalization happening through small-scale LLMs. And now, we have witnessed the emergence of sufficiently large models which can be generalized across multiple problems. This trend is not limited to textual data but extends across various data types, presenting a substantial challenge. As we continue to scale up the deployment of this ‘intelligence’, we are creating many opportunities and solving problems. However, this expansion also brings with it a chain of significant challenges that are important to be addressed.

Today, we are discussing IP and ownership of the IP, how about the regulations? For humans, there are clear regulations across different scenarios. For AI to be recognized as an Author, the regulations probably need to be adjusted to address the scale of the AI model. The issue is these are the kinds of challenges we have not seen before, and it’s a given. These tools and systems are democratizing and becoming applicable across use cases and user profiles. The regulations need to be thought upon, if not clearly defined. So, there is no clear way to define and categorize them as ‘yes’ and ‘no’. As mentioned in the example above, the copyright was issued for one case and not another. It would depend on a case-to-case basis, at least for the next few years, before we see a standardization in the definitions.

Surely, there will be a lot of confusion, and that’s why any patent related to software will take more time. It's quite tough to prove and validate the originality. And people will take more time to validate, which means anything that you want to issue a patent or copyright will probably take more time, particularly if you use AI as part of the patent, or the asset, or as an enabler to deliver that asset. Probably in the next 5-10 years, we will have clear guidelines.

Personally, I feel that the moment you recognize AI as a citizen or as an author, we humans would not be able to compete with the deliverables of such systems. Then, does it make sense to create a separate class of regulations for them? That is among one of the recommendations - why don't we create AI-specific patent laws, meaning a patent or copyrights that are created purely by AI, where AI will become the main author, and humans will become co-authors. So we may recognize AI as an author or a co-author, but we will recognize it as an alien - Like aliens come to earth and have separate rights and regulations. AI and machine intelligence are on a different scale and framework, and they will require a separate class of regulations.

They could still be regulated, but there is still a benefit of the doubt being created. For example, if you say any output from AI is not patentable or not copyrightable, that's also not ideal. There are artists who would be using these tools to create innovative works of art. If that cannot be recognized as a unique art, then somebody else can quickly copy it and replicate it. It could be art, it could be software or code that is being generated along with AI. If are not given copyrights since they used AI, what stops anybody from copying that?

Everybody's trying to address this problem in their own way through customized frameworks. I'm looking forward to more such judgments because jurisdiction can provide a lot of guidance regarding debates, the sense of such debates, and whatever the final outcome is. That could give us guidelines regarding where we are heading towards and what kind of frameworks we are heading towards.

In short, AI can probably be recognized as an author, but not in the same framework as humans. It could probably have its own framework, maybe then AI can be recognized as an author. When is it going to happen? I'm not sure, but I look forward to seeing what's going to happen in the market.

Resources:

Committee Urges Government to Protect Copyrighted Material from Unrestricted AI Use: https://www.cryptopolitan.com/government-to-protect-arts-from-copyright/
Viral AI-Generated Drake Song ‘Heart on My Sleeve’ Removed from Spotify, YouTube: https://www.vice.com/en/article/xgwx44/heart-on-my-sleeve-ai-ghostwriter-drake-spotify
Zarya of the Dawn: How AI is Changing the Landscape of Copyright Protection: https://jolt.law.harvard.edu/digest/zarya-of-the-dawn-how-ai-is-changing-the-landscape-of-copyright-protection
Copyright Office Issues Guidance on AI-Generated Works, Stressing Human Authorship Requirement: https://www.skadden.com/-/media/files/publications/2023/03/copyright-office-issues-guidance-on-ai-generated-works/copyright_office_issues_guidance_on_aigenerated_works_stressing_human_authorship_requirement.pdf?rev=0b9d148a1e444b6888b639cefdc86838
ChatGPT seems to be trained on copyrighted books like Harry Potter: https://www.newscientist.com/article/2372140-chatgpt-seems-to-be-trained-on-copyrighted-books-like-harry-potter/
Getty Images suing the makers of popular AI art tool for allegedly stealing photos: https://edition.cnn.com/2023/01/17/tech/getty-images-stability-ai-lawsuit/index.html
These artists found out their work was used to train AI. Now they’re furious: https://edition.cnn.com/2022/10/21/tech/artists-ai-images/index.html
Patents and AI inventions: Recent court rulings and broader policy questions: https://www.brookings.edu/articles/patents-and-ai-inventions-recent-court-rulings-and-broader-policy-questions/

Article

Maximizing Machine Learning Efficiency with MLOps and Observability

As organizations navigate real-world complexities, it is essential to prioritize both MLOps and observability to create a solid foundation for building, maintaining, and scaling trustworthy ML models.

Article

Synthetic ‘AI’ vs Generative ‘AI’: Which one to use to strengthen data engineering in machine learning

In the world of machine learning, data is the cornerstone of building robust models. Synthetic AI and Generative AI are already making waves, reshaping various industries and creative processes. In this blog, we will delve into the nuances of Synthetic AI and Generative AI highlighting their distinctions and potential applications.

Article

Decoding the EU's AI Act: Implications and Strategies for Businesses

Discover the latest milestone in AI regulation: the European institutions' provisional agreement on the new AI Act. From initial proposal to recent negotiations, explore key insights and actions businesses can take to prepare for compliance. Get insights into actions organizations should take to get ready.

See how AryaXAI improves
ML Observability

Learn how to bring transparency & suitability to your AI Solutions, Explore relevant use cases for your team, and Get pricing information for XAI products.

Schedule a demo

AryaXAI is a full stack ML Observability tool for mission-critical AI functions. Designed by Arya.ai, it is aimed to deliver much required common platform between stakeholders and deliver trust, transparency and auditability.

PRODUCTS

RESOURCES