Generative AI Copyright: Where Does Copyright Risk Arise?

Generative AI Copyright

It can be said that copyright legislation is in a ‘transitional phase’ as long-established doctrines on reproduction and originality are being tested against the outputs of generative AI systems. While the legal framework remains broadly formally stable, its practical application and interpretation is being increasingly shaped as the use of generative AI becomes more prevalent.

To date, the central tension lies in the application of reproduction rights, the concept of originality with its differing thresholds, and text and data mining (“TDM”), the automated process by which large volumes of text, images or other data are analysed for information utilised for training AI models. All of which do not tie in neatly with traditional copyright doctrine.

The debate is therefore less about one single act of infringement, and more about where copyright risk is located within the AI lifecycle.

Dataset ingestion as reproduction

Getty Images v. Stability AI reflects an input-centred theory of infringement. The dispute was initially significant because it raised a question whether the copying and ingestion of copyrighted works into training datasets constitutes reproduction under UK copyright law, irrespective of any outputs.

Getty’s position treated dataset construction as the relevant act of copying, arguing that infringement arises at the point of ingestion. Stability, by contrast, characterised training as non-expressive statistical processing in which works are not stored in a meaningful expressive form but transformed into mathematical representations.

However, this judgment did not ultimately resolve the legality of AI training. Getty’s training and development claim was abandoned because there was insufficient evidence that the relevant training had taken place in the United Kingdom.

Click here for further reading on this case.

Conditional legality under TDM

Article 3 of the Copyright Digital Single Market Directive, on the other hand, upholds an exemption for TDM for scientific research. In Robert Kneschke v LAION e.V., the Hamburg Regional Court treated dataset construction as potentially lawful as such was claimed to be carried out for scientific research purposes. In this context, “scientific research” was interpreted broadly, extending such to the preparatory stages necessary for the development of AI models.

The court accepted that copying for dataset creation may fall within the Article 3 TDM exception, provided the statutory conditions are met. Importantly, it distinguished the legality of dataset construction from the later deployment or commercial use of trained AI models. The decision was later upheld on appeal by the court, which also underlined the practical importance of effective, machine-readable rights reservations.

However, even following the appeal, this reasoning should not be read as a general permission for commercial AI training. The court confirmed that LAION could rely on the TDM exceptions in the circumstances of that case, but the decision remained tied to dataset creation, scientific-research purposes, and the validity of rights reservations in a machine-readable form. Ergo, it leaves unresolved whether, and to what extent, the same reasoning can apply to commercial AI training carried out for the development and deployment of commercial systems.

Click here for further reading on this case.

Memorisation and model structure

A more significant development emerges in GEMA v. OpenAI, where the focus shifts from dataset ingestion to the inner workings of the model itself. The court’s reasoning treated “memorisation” within model parameters as potentially constituting reproduction. This is particularly salient because the copyright analysis was not limited to whether protected works appeared in the final output. Rather, the court considered that infringement may arise where protected expression is retained within the model and can later be reproduced in recognisable form when prompted.

This represents a notable departure from traditional understandings of reproduction, which focused on more identifiable acts of copying or the output itself. It is suggested that copyright risk may arise within the model inherently. Not to mention that it also raises the question whether model parameters can embody protected expression for the purposes of the reproduction right.

At a broader level, this reasoning may technically narrow the practical scope for relying on TDM exceptions where the model demonstrably retains and can reproduce protected expression. However, the judgment should still be treated with some caution, as it is a national decision and does not yet represent settled EU-wide law.

Click here for further reading on this case.

Fragmentation and implications

The emerging case law demonstrates a lack of convergence on where reproduction occurs within AI systems. Three distinct approaches can be identified: Getty raises the issue of copying at the point of dataset ingestion and training; LAION addresses dataset construction through the lens of TDM exceptions; and GEMA shifts attention to memorisation within the model itself.

In the absence of clarity on the precise point at which reproduction occurs, AI developers face legal uncertainty across the AI lifecycle. Copyright risk may, at this stage, arise from anywhere.

Resultantly, AI developers are increasingly expected to implement internal copyright governance frameworks which cover, inter alia, dataset sourcing, rights reservations (and licencing) to output monitoring. This is also consistent with the direction of travel under the EU AI Act, which requires providers of general-purpose AI models (“GPAI”) to maintain copyright compliance policies and publish sufficiently detailed summaries of the content used for training. For GPAI models with systemic risk, additional obligations arise, including risk assessment and mitigation measures.

For now, the position remains unsettled. What is clear, however, is that copyright risk in generative AI can no longer be treated as arising only at the point of output, as was once understood. Rather, it may arise at the various stages of the model lifecycle, including the model in itself. Until a clearer path emerges, this uncertainty should be treated as a governance issue and certainly not a gap to be exploited.

AI related disputes? Intellectual property needs? Do not hesitate to contact us on info@gtg.com.mt for more information.

Article written by: Dr Terence Cassar, Dr J.J. Galea & Dr Mattea Pullicino

Disclaimer This article is not intended to impart legal advice and readers are asked to seek verification of statements made before acting on them.

Our Practice Areas

Corporate

Advisory

The Unsettled Position of Generative AI and Copyright

Generative AI Copyright

Dataset ingestion as reproduction

Conditional legality under TDM

Memorisation and model structure

Fragmentation and implications

We're here to help