AI and Copyright

Copyright Issues – Everything You Need to Know

AI-based models and systems raise a number of new questions from a copyright perspective. The spread of AI presents particular challenges for legislators, AI users, and authors whose works may be used by these systems.

At the outset, it should be emphasized that the majority of copyright questions are raised by large language models trained on web-harvested corpora, and by generative AI in particular. The best-known and most widely used generative AI among the general public, ChatGPT, generated the following definition for this concept for use in this article: “Generative artificial intelligence (AI) is a technology capable of creating new and original content, such as text, images, music, or video. This technology learns from prior data and creatively combines acquired knowledge to generate new patterns and forms. Generative AI differs from traditional AI, which is generally optimized to perform a specific task, in that generative models are also capable of producing creative content. However, its use raises ethical questions, particularly in the areas of copyright and data usage.”

The above definition reveals not only that AI appears to have a degree of self-criticism, but also clearly signals what the main problem with its use may be. Generative AI is able to produce its output because it first processes and ingests data in extraordinarily large quantities. Among the processed data there may very easily be works protected by copyright. Since the training data of an AI system becomes embedded in the final product in a way that cannot be fully reverse-engineered (as a result of the “black-box” effect mentioned in earlier articles), it is obvious that there is a significant risk of copyright infringement.

The system’s learning process has, incidentally, been incorporated into the Hungarian Copyright Act (Szjt.) as well, with the proviso that text and data mining carried out for the purpose of training AI systems qualifies as free use, but the rights-holder may object to this. It should be noted, however, that it is not clear how a rights-holder would learn of such use, and to what extent it can be guaranteed that their objection would prevent the infringement in time. Since the precise operation of AI systems constitutes a significant trade secret, it does not seem realistic that the training data used would be made public during the development or training phase.

It should also be noted that the problem is not merely theoretical: in the case of generative AI systems, it does sometimes come to light that copyright-protected data was included in the training set. This means that current systems can even be prompted to reveal their training data. In other words, it is quite possible that various techniques could be used to extract works protected by intellectual property rights — works to which the given user would have no right of access or use. It was for this reason that, for example, the New York Times sued Microsoft. According to the New York Times, articles were used to make ChatGPT more intelligent without first obtaining separate permission from the authors. As a result, ChatGPT could generate what amounted to “verbatim excerpts” of New York Times articles in response to queries, making them available to people who had not paid for the content.

Who Owns the Output of an AI System?

The other major question concerning generative AI systems is what status the output product holds. It seems relatively clear that a product generated by artificial intelligence would qualify as a work eligible for copyright protection if it had been produced by a human author. The currently prevailing position in legal scholarship, however, is that since only a human can be an author, a work generated by AI cannot become a copyrighted work and therefore cannot benefit from copyright protection.

While no such judicial decision is yet known from Hungary, foreign courts have already issued rulings that support this position. This is of particular interest to AI users, because without copyright protection their output will be freely usable by anyone. The outcome may differ, however, if a person genuinely and substantially participates in the creative process or substantially reworks the primary product generated by the AI — though it cannot yet be said that there is established case law even in that scenario.

What can be established, in any case, is that the fact that the AI system itself does not acquire copyright by virtue of generating something does not, in and of itself, mean that the output is not otherwise protected by copyright. As the foregoing makes clear, there can be situations where the “generated” result is in fact someone else’s protected work (e.g., an image, a brand, etc.). For those wishing to use the output of generative AI, we therefore recommend first and foremost verifying whether the result obtained is perhaps protected. Such verification is currently indispensable in most cases — also because, given the current state of the technology, there is no guarantee that the result is error-free. It is worth searching online to check whether the output already appears somewhere exactly as the AI generated it, and making modifications until the result is unique.

Not only the use of AI-based systems but already their creation raises questions. The software itself certainly qualifies as a copyrighted work, and the Hungarian Copyright Act contains special rules applicable to computer programs.

During the creation process, copyright issues must be given particular attention. In connection with the “training” of generative AI, we noted above that the manner and nature of the data used to train the software under development are far from irrelevant. Software development is in any case a specialist field: code and software components may also constitute significant trade secrets. The AI Act itself, in its Preamble, states that the AI system or model being developed and operated must comply with EU and other copyright law. Such EU and international rules do exist; there is a legislative framework for the protection of traditional (i.e., non-AI) software, backed by decades of judicial practice. Since there are currently no significant cases whose central question is AI-specific copyright legislation, it is expected that in the near future courts and other law-applying bodies will use these existing rules by analogy in disputes that arise.

Professional mentors: dr. András Bencsik (ELTE Faculty of Law), dr. Bernadett Bocsi (Retail Client Tribe IT Chapter), and dr. Laura Bikki Kovácsné (Innovation Tribe)

The AI Act and the Financial Sector

What Counts as AI?

Is AI Ethical?

Liability Questions in the Context of AI