Do Probabilistic Parameters in AI Models Amount to “Reproduction” or “Storage” Under Copyright Law?

Sneha Nagaraj FCIArb
Nov 18, 2025
7 min read

When an Artificial Intelligence (AI) model is trained on copyrighted material and converts that material into mathematical form for its processing, does this qualify as “reproduction” or “storage” of the underlying work which would violate the copyright of the author in that work? The inquiry into this question is not technical but legal. Copyright law protects how expression is copied and preserved. Machine-learning systems operate by consuming large volumes of text and converting them into numerical patterns. It is becoming important to more clearly determine where the line is, if there is one, between statistical learning and unauthorised copying. Indian copyright law contains some tools for assessing whether internal representations in a machine-learning model constitute infringement. The crux of the question is whether such representations which can recreate copyrighted works fall within the statutory definition of reproductions under section 14 of the Copyright Act.

The Munich Court’s Approach to Parameter-Level Memorisation

The Regional Court of Munich (LG München I, 42 O 14139/24) considered whether lyrics from nine German songs had been memorised by OpenAI’s models. Germany’s Music Collecting Society (“GEMA”) argued that the models could reproduce the lyrics when prompted, demonstrating that the copyrighted works had been absorbed into the models’ internal structure. The court accepted this position. The German Copyright Act defines reproduction broadly, including direct or indirect, temporary or permanent fixation of a work by any means. The court held that the parameters amounted to fixation because they allowed the protected lyrics to be retrieved. It rejected OpenAI’s reliance on Germany’s Text and Data Mining (TDM) exception, noting that the exception protects temporary copies for analytical use. Once the model’s architecture internalises the work in a way that enables output, the copying ceases to be temporary or analytical. It found that the lyrics were reproducibly present inside the AI model and that their presence was not merely a statistical influence but a form of embedded textual content. The court characterized this as memorisation and treated the model’s parameters as containing the protected works themselves.[1]

The court drew a firm doctrinal line: Training and output are separate acts. Training that results in retrievable copies is per se infringing. Output triggered by user prompts is a further infringement. This interpretation directly affects AI companies operating in the EU because it constrains training practices even when the resulting outputs are moderated.[1]

Although the judgment is appealable, it aligns with the direction of European regulatory frameworks, including the EU Copyright Directive (Articles 3 and 4 on TDM), which distinguishes between temporary analytical use and exploitation that undermines rightsholders’ control.[1]

Indian Copyright Act and the Definition of Reproduction

In India , the owner of a literary work has the exclusive right to reproduce the work “in any material form, including the storing of it in any medium by electronic means.”[2] The statute adopts a broad approach to fixation. Storage need not be literal. It includes any form that allows the work to be brought back into perceptible form. The Supreme Court in R.G. Anand v. Deluxe Films [3] emphasised that “copying should be a substantial or material one”. Further, it held that “violation of the copyright in such cases is confined to the form, manner and arrangement and expression of the idea by the author of the copyrighted work.”[3] In Eastern Book Company v. D.B. Modak [4], the Court confirmed that digital storage of a work can constitute reproduction if the original expression is capable of being recognised in substance.

Applying these principles, if an artificial intelligence model trained on copyrighted content in India encodes that content into parameters that allow it to recreate the content and is capable of being recognised in substance or retains the essential expression of the work or recreates the original expression in a discernible manner, this would meet the definition of reproduction. The question the Courts may consider is whether the protected work can be recovered. The form of storage, whether numerical values, embeddings, or probabilistic weights, does not change the legal analysis.

The Absence of a Text and Data Mining Exception in India

The European Union recognises explicit text-and-data-mining (TDM) exceptions under the Directive (EU) 2019/790 on Copyright in the Digital Single Market (DSM Directive) [9]. India does not have a statutory exception that corresponds to the EU’s text and data mining provisions. Section 52 of the Copyright Act recognises fair dealing for research, private study, criticism and certain educational uses. (The Copyright Act n.d.) A fair reading of that provision would suggest that it does not extend to large-scale commercial training of artificial intelligence systems, and it does not permit outputs that recreate copyrighted content. As a result, an Indian court examining training on copyrighted material would likely treat both the internal copying and the output of protected expression as acts requiring authorisation.

Communication to the Public and the Location of Infringement

Section 51 addresses infringement not only through reproduction but also through communication to the public. Indian courts have taken a broad view of digital communication. In Super Cassettes v. MySpace [5], the Delhi High Court held that the availability of protected content within India triggers liability even when the server infrastructure is located elsewhere. This principle applies equally to outputs generated by artificial intelligence systems. If a model trained on copyrighted works makes those works available to users in India, the training location does not limit the scope of infringement.

Internal Copies and the Fixation Analysis

Indian law does not contain explicit language on intermediate technical copies created during training. However, courts have considered whether internal copies amount to infringement in circumstances involving photocopying and digital storage. In Chancellor Masters of Oxford v. Rameshwari Photocopy Service [6], the Delhi High Court considered whether internal copying of substantial parts of works constituted reproduction and held that making internal copies, even if for instructional purposes within an institution, constitutes reproduction under copyright law. Whether such reproduction is permitted depends on the statutory exceptions; otherwise, the act of copying itself meets the threshold of an infringing reproduction [6].

These principles guide how internal parameter storage in artificial intelligence models may be examined. If the storage is transient and cannot lead to retrieval, it may fall outside the scope of infringement. If, however, the protected work persists in the system and can be output, the storage would be treated as a reproduction.

Regulatory and Policy Context in India

India has issued AI guidelines through the AI Advisory Committee. These guidelines rely on existing statutes, including the Copyright Act and the Information Technology Act, 2000 [7]. India does not have an AI-specific statute. There is no regulatory instrument addressing text and data mining. As a result, Indian developers and rights holders operate within a statutory framework that sets broad rules on reproduction but offers no guidance on permissible training practices.

This creates a practical gap. Large-scale machine learning requires ingesting significant volumes of data. Without a specific exception or safe harbour, developers face uncertainty regarding intermediate steps. Rights holders may rely on Section 14 and Section 51 to challenge both training and output. The absence of a dedicated TDM framework limits the scope for unauthorised copying in the training stage.

Practical Reflection

For legal teams advising developers, publishers or platforms, the central task is evidentiary. Courts will ask whether a specific copyrighted work, or a substantial part of it, can be retrieved from the model. The technical explanation that the system uses numbers, embeddings or probabilistic weights does not by itself resolve the issue. What matters is whether the system can, through ordinary inputs or controlled testing, reproduce recognisable parts of the protected work.

In advisory and transactional work, lawyers must examine how training data is gathered, whether licences have been obtained and how internal copies are stored. Documentation becomes critical. If a dispute arises, the model’s behaviour will be scrutinised through targeted prompts designed to reveal memorisation. This approach mirrors the method used in the Munich case, where simple prompts demonstrated that the model's parameters contained the copyrighted lyrics.

For rights holders, assessing whether a work has been reproduced requires comparing outputs with the original text. Indian courts have developed the substantiality test through cases such as R.G. Anand and Najma Heptulla v. Orient Longman [8]. Even partial reproduction may be actionable if the portion reproduced is important to the work.

For developers, the absence of a statutory TDM exception increases exposure. Internal memorisation cannot be dismissed as an engineering detail. If protected expression is embedded and recoverable as an output, the system may fall within the definition of reproduction. Legal teams working with artificial intelligence systems must therefore combine technical understanding with statutory interpretation and established case law.

Conclusion

The legal analysis turns on a single question: Whether protected expression can be retrieved from within a model. When the answer is in the affirmative, internal parameters that hold that expression are likely to be treated as reproductions under Indian copyright law. The statutory framework already provides the tools for this assessment, and courts will continue to apply these principles as artificial intelligence models evolve.

[1] GEMA v. OpenAI, No. 42 O 14139/24 (LG München I [Ger.] Nov. 11, 2025), https://www.justiz.bayern.de/gerichte-und-behoerden/landgericht/muenchen-1/presse/2025/11.php

[2] Under Section 14(a)(i) of the Copyright Act

[3] (1978) 4 SCC 118

[4] (2008) 1 SCC 1

[5] (2017) 236 DLT 478 (DB)

[6] (2016) 234 DLT 279

[7] India AI Governance Guidelines: Enabling Safe and Trusted AI Innovation, Press Information Bureau, Government of India (Nov. 5, 2025), https://static.pib.gov.in/WriteReadData/specificdocs/documents/2025/nov/doc2025115685601.pdf

[8] AIR 1989 Delhi 63

[9] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC, O.J. L 130/92 (17 May 2019).

CONTACT US