Otto, Ri-nova 2021, 21-32

You may download the issue by clicking on the image above.

3D Protein Structure Prediction Using AI System AlphaFold

and its implications for science and law, especially for the IP rights system

Claudia Otto

A. Introduction

The possibilities of technology increase with its rapid advancement and continue to inspire us, especially in the field of artificial intelligence. For example, the half-century-old protein folding problem is said to be solved by an AI system called AlphaFold.[1] AlphaFold is used to calculate and thus predict the structure of proteins that have not yet been fully researched. Developed by a privately held company based in the United Kingdom, DeepMind Technologies Limited (hereafter DeepMind), a subsidiary of Alphabet Inc, AlphaFold‘s source code and model parameters have been published on GitHub[2]. The protein structure predictions made so far using AlphaFold have also been published in the AlphaFold Protein Structure Database (hereafter AlphaFold DB) for anyone to access. The EMBL-EBI[3], the European Bioinformatics Institute (EBI) as part of the European Molecular Biology Laboratory (EMBL), which is located in Heidelberg, Germany, has contributed to this project.[4]

After an introduction to protein structure and protein structure prediction, this paper addresses the question of the extent to which an AI system such as AlphaFold and related databases are protectable under German and European law, respectively, and what rights exist to protein structure predictions generated by means of AI systems such as AlphaFold. Finally, based on the discussed example AlphaFold, the question is answered whether an extension of the German and European IP rights system by AI-specific regulations is necessary.


B. Introduction to protein structure and protein structure prediction

I. The protein structure

1. Function follows form: importance of protein structure

Proteins are molecules, have many names and perform important tasks in the body of living beings. Such tasks include protecting the body (through antibodies), enabling chemical reactions in the body (through enzymes), transporting substances such as oxygen from the lungs to the rest of the body (through hemoglobin), or regulating blood sugar levels, for example (through hormones, in this case insulin). The human proteome[5] alone comprises tens of thousands of proteins.[6] Proteins come in different sizes and shapes. They are formed from 20 different, linked amino acids[7], the so-called canonical amino acids. The number of amino acids, their respective type and sequence (the so-called amino acid sequence) essentially determine the specific properties of the protein molecule.[8] As well as its specific three-dimensional shape by folding, the so-called conformation. The finally adopted three-dimensional structure determines the specific function of the protein.[9]

Theoretically, many alternative folds of a protein are possible. Especially many in the case of proteins with very long amino acid chains, because the combination possibilities rise exponentially to astronomical heights as the number of amino acids increases.[10] However, only one, the lowest-energy one, is considered to be the correct form that performs the specific task of the protein.[11] It is called the native conformation.[12]

When a protein is formed in the course of protein biosynthesis, it is not a string of pearls that is formed by trial and error, checking out numerous options like a person doing research. That would take a period of time that exceeds the lifetime of a protein itself and that of a human being many times over – and would make life impossible.[13] Rather, natural processes ensure that a protein assumes its native form in fractions of a second to minutes and while protein biosynthesis is still underway.[14] For example, certain amino acids attract each other electromagnetically and form pairs, i.e. contact points. However, the involvement of other proteins such as enzymes may also be required for the final three-dimensional structure. Disorders in the formation of a functional fold are called protein folding diseases. These include, for example, certain forms of cancer, Alzheimer‘s disease and type 2 diabetes mellitus.[15]

The shape-determining natural processes have not yet been fully elucidated. We know the components of proteins and can determine the amino acid sequence, but we do not yet fully understand the natural rules by which proteins (self-)fold. This is the so far unsolved so-called protein folding problem. The spatial structure of a protein molecule therefore always had to be determined in elaborate experimental procedures, e.g. crystal structure analysis[16]. As long as the spatial structure of a protein eludes our knowledge, its specific function or even malfunction eludes our knowledge. If we understand the natural rules of protein folding and thus also deviations, we can, for example, better understand diseases and find treatment options.


2. The representation of the protein structure

Biochemistry and bioinformatics represent the protein structure according to certain specifications.[17] For example, it is divided into different structural levels. The hierarchical representation in primary structure (amino acid sequence), secondary structure (the relative arrangement of amino acids to each other), tertiary structure (the spatial arrangement of secondary structure elements) and quaternary structure (the arrangement of individual proteins in larger complexes) goes back to Kaj Ulrik Linderstrøm-Lang. These protein conformations, graphically represented according to specifications, are deposited in the Protein Data Bank (PDB), among others. Its goal was and is to maintain a single archive of macromolecular structural data that is freely and publicly available to the world community.[18]


3. The protein structure prediction methods

Protein structure prediction is one of the main goals of bioinformatics and theoretical chemistry. It includes many different methods, e.g. evolutionary, physical and geometric approaches:[19]

These methods include prediction from evolutionary information. Proteins can be related to other proteins, have similar amino acid sequences and thus similar three-dimensional structures. In the course of evolution, changes can also occur here: If a change occurs in an amino acid, for example, due to mutation, the protein can become unstable. Stability can then be restored by another, compensatory mutation. This is called coevolution. The understanding of coevolution allows a better understanding of the (reasons for the formation of a) three-dimensional protein structure.

Amino acid side chains play an important role in protein folding. They essentially determine how an amino acid “behaves“. For example, it can be neutral, charged, acidic, basic, hydrophobic (water-repellent) or hydrophilic (water-attracting). These properties determine the interactions among pairs of amino acids. The resulting geometry is considered in side-chain prediction.

So-called ab-initio predictions are based on physical know-ledge. This knowledge allows to infer from the primary structure to the secondary and tertiary structure. As mentioned, certain amino acids, for example, attract each other and form pairs, i.e. contact points. On this basis, the folding process can be simulated. Other methods calculate possible protein structures to determine the most energetically favorable one. But these methods require great computing power, as the above-mentioned numerous possible combinations suggest.

Known structures determined by physical measurement can serve as a starting point for determining the structure of related proteins in comparative prediction methods. A so-called contact map, from which the amino acid contact pairs are derived, can be helpful here. It allows conclusions to be drawn about the tertiary structure of a protein.

Bringing together several methods and thus evolutionary, physical and geometric knowledge about protein structures seems to make sense in order to be able to solve the protein folding problem. This might have happened at least partially with AlphaFold. Partially because, for example, multidomain proteins or protein complexes are not predicted.[20] Also, the involvement of other proteins in structure formation is not taken into account.


II. Has AlphaFold solved the protein folding problem?

AlphaFold is an AI system (further)[21] developed by DeepMind, which consists of artificial neural network structures at its core. It uses training methods based on existing knowledge about evolutionary, physical and geometric constraints of protein structures.[22] AlphaFold builds – obviously – on the work of numerous researchers from around the world. Specifically, AlphaFold draws on existing protein structure data such as that in the PDB[23].[24] Based on this data, AlphaFold can predict the shapes of many proteins.[25] The accuracy is comparable to that of expensive and time-consuming laboratory experiments.[26]

In the CASP14 science competition[27], protein structure predictions made using AlphaFold scored 90 out of 100 points for about two-thirds of the given 100 amino acid sequences.[28] Above the threshold of 90 points, remaining differences to results of experimental structure determinations and alternative conformations of lower energy are considered small.[29]

AlphaFold has essentially identified correlations in the numerous, existing protein structure data that are not apparent to humans due to sheer mass:[30] As stated above, the folding possibilities of a protein are numerous and only one fold is considered the correct one. Identifying the ideal fold was always promising when additional information was found to narrow down the search area.[31] AlphaFold, by combining knowledge of different methods, automatically captured and considered such information. However, it has remained unknown which information this is.[32] AlphaFold has remained unable to predict misfolding or protein structural changes in the dynamically changing proteome.[33]

As a result, AlphaFold has not solved the protein folding problem, but it has accelerated the path to a solution.


C. Legal significance

The legal issues surrounding the development and use of AI systems are manifold. Here, limitation is made to those concerning the legal protection of AlphaFold and the AlphaFold DB itself, as well as the legal protection of the deposited protein structure predictions generated with AlphaFold.

It is important to distinguish: There is the AlphaFold research software, whose source code is published[34], and there is the (published) AlphaFold model, which results from the totality of the training parameters. In addition, there is (training) data obtained by AlphaFold from third-party databases, the EMBL-EBI protein structure database AlphaFold DB, which is accessible to everyone and in which protein structure predictions generated by AlphaFold can be retrieved, and, last but not least, the protein structure predictions themselves. The high complexity of AlphaFold can only be covered in outline here. Also, specific legal issues resulting from the fact that the subject matter crosses European borders cannot be covered. Therefore, the following is largely abstracted and focuses on German and, as far as directly relevant, European law.


I. IP rights relating to AlphaFold

DeepMind has filed several patent applications related to AlphaFold.[35] Decisions are not yet available. In addition, copyrights and related rights could play a role. Especially the latter may also be rights of the research institutions involved in the development and provision of AlphaFold, AlphaFold DB and AlphaFold protein structure predictions.


 1. Patentability of AlphaFold

According to Sec. 1 (1) German Patent Act (PatG) and Art. 52 (1) European Patent Convention (EPC), patent protection requires an invention in a field of technology, novelty, inventive step and industrial applicability.

According to Sec. 1 (3) PatG and Art. 52 (2) EPC, respectively, the following – in particular – are not considered inventions:

  • Discoveries, scientific theories and mathematical methods;
  • aesthetic creations of form;
  • Plans, rules and procedures for mental activities, for games or for business activities, and programs for data processing equipment;
  • the reproduction of information.

With AlphaFold, DeepMind has, at least at first glance, found mathematical methods and computational procedures for efficiently analyzing existing protein structure knowledge, discovering correlations in it, applying the correlation information to new protein structures and reproducing them according to established scientific standards. AlphaFold also appears at first glance to be a program for data processing equipment. However, Art. 52 (2) EPC precludes the patentability of the subject matter or activities mentioned therein only to the extent that the patent application relates to such subject matter or activities as such, cf. Art. 52 (3) EPC. Thus, a more detailed examination of the respective application is required.

DeepMind‘s most recent patent application is “Protein Structure Prediction From Amino Acid Sequences Using Self-Attention Neural Networks[36]. The abstract highlights the following features:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a predicted structure of a protein that is specified by an amino acid sequence. In one aspect, a method comprises: obtaining a multiple sequence alignment for the protein; determining, from the multiple sequence alignment[37] and for each pair of amino acids in the amino acid sequence of the protein, a respective initial embedding of the pair of amino acids; processing the initial embeddings of the pairs of amino acids using a pair embedding neural network comprising a plurality of self-attention neural network layers to generate a final embedding of each pair of amino acids; and determining the predicted structure of the protein based on the final embedding of each pair of amino acids.”[38]


a) Is AlphaFold an invention in a field of technology?

AlphaFold would have to have a “technical character“ or “teaching for technical action“ as its subject matter, i.e. an instruction directed to a skilled person to solve a specific technical task with specific technical means.[39]

The assumption of a technical character seems to be initially contradicted by the fact that the AlphaFold source code and the AlphaFold model parameters can be downloaded from GitHub to and used on any computer with sufficient memory.[40] This counts in favor of a computer program, which would not constitute a patentable invention under Sec. 1 (3) no. 3 PatG or Art. 52 (2) lit. c EPC. The storage of the computer program on a storage medium alone does not lead to its patentability. Moreover, artificial neural networks are considered mathematical-abstract computational models without technical character and thus are not regarded as an invention in a field of technology within the meaning of European or German patent law.[41] This applies regardless of whether they are “capable of learning“ or can be “trained“ using training data.[42] In this respect, even the AlphaFold parameters being made available for download would not change the provisional classification as non-patentable software.

However, patent application number WO2021110730A1 indicates that physical computer storage media are included that store instructions that, when executed by the one or more computers, cause the computer or computers to perform the respective operations of the method described.[43] DeepMind thus claims a computer-implemented invention. The claim formulation is permissible.[44]

In case of a computer-implemented invention, the examination usually starts with the process claim. If the subject matter of the process claim is found to be novel and inventive, the subject matter of the remaining claims from a set of claims structured according to the above formulations is usually also to be regarded novel and inventive, provided that these claims comprise the features corresponding to all those which ensure the patentability of the process.[45]


b) Is AlphaFold new?

An invention is new if it does not form part of the state of the art, Sec. 3 (1), sentence 1 PatG, Art. 54 (1) EPC. According to Sec. 3 (1), sentence 2 PatG, Art. 54 (2) EPC, the state of the art comprises all knowledge which has been made available to the public by written or oral description, by use or in any other way before the date relevant for the priority (i.e. the date of filing) of the application.

The above-mentioned patent application was filed on December 2, 2020,[46] two days after the results of CASP14 were announced on November 30, 2020.[47] The novelty is already supported by the fact that other software solutions did not achieve comparable results to AlphaFold in CASP14.

AlphaFold was not, at the time of filing, the first and only software developed and used to solve the protein folding problem.[48] However, it combines, apparently for the first time, different methods of protein structure prediction using artificial neural networks. The application of so-called self-attention algorithms[49] also appears new at first glance.

Hence, there could be a decisive difference to the state of the art and novelty could be assumed.


c) Is AlphaFold based on an inventive step?

An invention is deemed to involve an inventive step if it is not obvious to a skilled person (Sec. 4, sentence 1, PatG, Art. 56, sentence 1, EPC).

AlphaFold‘s protein structure predictions are based on existing knowledge, which may have been combined in a novel way. The undoubtedly outstanding potential of AlphaFold lies in the effective combination of different protein structure prediction methods and the speed in finding (yet unknown) correlations in globally known protein structures. For experts, however, the protein structures proposed by AlphaFold are obvious precisely because they are based on the accumulated knowledge of the expert community. They simply lacked and still lack the time to arrive at the same or only correct result by conventional experimental means.

With respect to an earlier patent application, DeepMind received a written assessment that there was no inventive step.[50] This also appears to be the case with regard to the current patent application.


d) Interim conclusion

The AlphaFold AI system does not appear to be patentable. Although novelty and industrial applicability can be assumed quite reliably, the most important prerequisite, the inventive step, is probably lacking. According to a preliminary view, AlphaFold merely establishes correlations in existing information on protein structures and reproduces them graphically for as yet unknown protein structures in accordance with established scientific standards. Instead of an invention, DeepMind has probably made a discovery or made such discovery possible. If a new property of a known material is discovered, this cannot be patented. As such, it has no technical effect and is therefore not an invention within the meaning of Sec. 1 (1) PatG or Art. 52 (1) EPC.


2. Anglo-American copyright vs. European copyright

DeepMind claims that it owns the copyright to the AlphaFold documentation published on GitHub as well as the data in the AlphaFold DB.[51] It should be noted that Anglo-American copyright law differs from continental European copyright law including German copyright law. The term “copy-right“ indicates that both refer to the right to copy, i.e., the right of use of the work in question, which is granted by the holder of the rights. In this respect, there is a similarity. However, Anglo-American copyright aims at protecting the interests in the economic exploitation, whereas European and German copyright aims at protecting the interests of the author. While a legal person can be the owner of an Anglo-American copyright, according to Sec. 7 of the German Copyright Act (UrhG) only a natural person who is capable of intellectual creation within the meaning of Sec. 2 (2) UrhG can be the author and copyright owner. However, the author may grant a right of use to a legal person.

A copyright notice in connection with named natural persons as here on GitHub[52] facilitates the proof of authorship pursuant to Sec. 10 (1) UrhG. According to this, whoever is designated as the author in the usual manner on the reproductions of a published work or on the original of a work of fine arts is deemed to be the author of the work until proven otherwise; this also applies to a designation known as the author‘s alias or artist‘s mark.


3. German copyright[53]

Copyright is inseparably linked to the author capable of intellectual creation. It cannot be transferred between living persons, see Sec. 29 (1) UrhG.

An AI system like AlphaFold is primarily software. Software includes all “soft“ components of a computer such as computer programs and data(bases).


a) (German) copyright in computer programs

Computer programs may be protected by copyright as linguistic works (cf. Sec. 2 (1) no. 1 UrhG) if they constitute individual works in the sense that they are the result of their author‘s own intellectual creation (Sec. 69a (3) sentence 1 UrhG). Because of the AlphaFold software outperforming other protein structure prediction software in the context of CASP14, the necessary level of intellectual creation is assumed. If several people have jointly created a work without their shares being separately exploitable, they are co-authors of the work (Sec. 8 UrhG). In this respect, the natural persons named on GitHub[54] can be regarded as co-authors. Computer programs only have to be created for protection to begin, but not completed. An application such as to the German Patent and Trademark Office is not required.

According to Sec. 69a (1) UrhG, computer programs are programs in any form, including design material. A program is a set of individual instructions compiled for the purpose of performing a specific task.[55] To be more precise, a computer program is an algorithm written in programming language. An algorithm is a unique processing instruction that can be executed by a mechanically or electronically operating device (or even by a human being).[56] The linguistic representation of the algorithm must therefore be precise, i.e. the sequence of the individual processing steps must be clear from the algorithm.[57] If multiple options are available to choose from, the decision set according to which one of the options is to be chosen must additionally be specified exactly.[58] Furthermore, the algorithm must comprise a finite number of processing steps and thus be able to come to an end.[59]

The source code in particular is covered by German copyright protection. However, the ideas and mathematical concepts on which it is based are not protected by German copyright (cf. Sec. 69a (2) sentence 2 UrhG). According to Art. 2 of the Copyright Treaty of the World Intellectual Property Organization (WIPO), copyright protection extends only to forms of expression and not to thoughts, processes, methods or mathematical concepts as such. This does not mean, however, that the contribution to AlphaFold in Nature[60] explaining thoughts, processes, methods and mathematical concepts is unprotected. Rather, this text represents a separate result of intellectual creation and is protected as a linguistic work within the meaning of Sec. 2 (1) no. 1 UrhG.

The author of source code as well as co-authors (Sec. 8 (1) UrhG) such as those of AlphaFold are entitled to an exclusive right of economic exploitation pursuant to Sec. 69c UrhG, for example through use or granting of rights of use thereto, upon commencement of copyright protection pursuant to Sec. 69a UrhG. If a computer program is created by an employee performing his employee duties or according to the instructions of his employer, the employer is exclusively entitled to exercise all proprietary rights in the computer program, unless otherwise agreed, Sec. 69b UrhG. However, the author has the right to acknowledge his authorship, Sec. 13 UrhG. He can determine whether the work is to be provided with an author designation and which designation is to be used.

If another party performs acts of exploitation without the author‘s permission, the author has, for example, a right to removal and injunctive relief against the infringer (Sec. 97 (1) UrhG). If the infringer acts culpably, the author has a claim for damages (Sec. 97 (2) UrhG).

Research software, i.e. software specifically for use in research, can be made available under various free content licenses. An overview can be found on the website of the Institute for Free and Open Source Software Legal Issues.[61] The AlphaFold source code is also provided under the Apache Licence (2.0)[62] [63]. This is a license without the so-called copyleft effect. This means that there are no restrictions on modifications, further development and recombination. The licensee can redistribute modified versions of the software under any license conditions and can also convert them into proprietary software.[64]


b) (German) copyright regarding (training) data

DeepMind provides the AlphaFold parameters under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.[65] These parameters comprise all those values and thus information that enable AlphaFold to make the “high-precision“ protein structure predictions after training with the aforementioned research data. The totality of the “learned“ parameters is called a model. If DeepMind did not provide the parameters, users of the AlphaFold research software would have to train it extensively again.

As indicated, AlphaFold as a data-based research software needs training data. These are explicitly obtained from the following research databases:







The PBD research data, for example, is explicitly in the public domain. The collection is distributed under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.[67] Furthermore, the general freedom[68] of data as such, and thus content, facts and information, is a necessary prerequisite for free science and research.[69]

According to Sec. 2 (2) UrhG, copyright protection therefore extends only to intellectual-creative forms of expression, i.e. not to their information content, such as scientific findings, training or measurement values. Copyright protection in research materials can therefore exist in particular in the concrete linguistic (Sec. 2 (1) no. 1 UrhG), photographic (Sec. 2 (1) no. 5 UrhG) and scientific representation, for example in drawings, sketches, tables and plastic representations (cf. Sec. 2 (1) No. 7 UrhG). The use of technical aids, such as an AI system, does not in principle prevent copyright protection from arising.[70] However, if these representations are created exclusively by machine, there is no personal intellectual creation and thus no protectability.

Where no third-party copyrights can exist to (training) data, there are no copyright restrictions on its use. However, this does not affect the legitimate interest in naming the discovering person(s). This is safeguarded, for example, by the principles of “good scientific practice“, doctoral regulations or even unwritten social norms.[71] The discovering persons are merely not entitled to create an information monopoly[72] that curtails the basis of free science and research.


c) (German) copyright in a database work

A database can be protected by German copyright law. This is the case with so-called database works (Sec. 4 (2) UrhG). A database work within the meaning of the UrhG is a collective work whose independent elements are arranged systematically or methodically and are individually accessible by electronic means or otherwise. The selection or arrangement of the data or other elements requires a personal intellectual creation according to Sec. 2 (2) UrhG, cf. Sec. 4 (1) in conjunction with Sec. 4 (2) UrhG. Here, too, the object of protection is not the content, but only the structure in which the personal intellectual creation is expressed.

With regard to the collection of the AlphaFold model parameters, the parameters as elements do not appear independent in a way that they can be separated from each other and accessed individually. There is no indication that the selection or arrangement of the model parameters is based on personal intellectual creation. AlphaFold’s model parameters therefore do not seem to be protected under (German) copyright law. The script[73] – named scripts/ – involved in the parameters’ downloading process, however, may be protected as linguistic work within the meaning of Sec. 2 (1) no. 1 UrhG (see above).

In contrast to that, the AlphaFold DB is to be regarded as a database work within the meaning of Sec. 4 (2) UrhG. In it, all individual und thus independent protein structure predictions calculated so far by means of AlphaFold are arranged systematically and methodically with their respective associated information. The protein structure predictions in the AlphaFold DB are individually accessible by electronic means (e.g. via the search function[74]). In this case, however, it is not DeepMind that provides the AlphaFold DB, but the EMBL-EBI.[75] DeepMind provides the protein structure predictions generated with AlphaFold and thus content. AlphaFold was not used to create the database.[76] In this case, the editors at EMBL-EBI would be the original authors of the exploitation rights within the meaning of Sec. 15 et seq. UrhG. However, the activities of the intergovernmental EMBL, which is based in Heidelberg, Germany, are of a non-private, official nature,[77] so that AlphaFold DB, as an official (database) work published in the official interest for general information, might not be subject to copyright protection according to Sec. 5 (2) in conjunction with Sec. 5 (1) UrhG.[78]

Correspondingly, the following reference can be found on the AlphaFold DB protein structure pages (translated):

EMBL-EBI expects its online services, databases and software to be referred to (e.g. in publications, services or products) in accordance with good scientific practice.”[79]


4. Ancillary (German) copyright to databases

With regard to databases without creative character, from which information for the use of AlphaFold is obtained, a related IP right, the so-called ancillary copyright of the database producer, may also be considered.

According to Sec. 87a (1) sentence 1 UrhG, such database is a collection of works, data or other independent elements that are systematically or methodically arranged and individually accessible by electronic means or otherwise, and the acquisition, verification or presentation of which requires an investment that is substantial in nature or extent. Substantial means: the investment must not be insignificant. According to Sec. 87a (2) UrhG, the database producer is the party who has made this investment. Pursuant to Sec. 87b (1) UrhG, only the database producer has the exclusive right to reproduce, distribute and publicly display the database as a whole or a substantial part of the database in terms of its nature or scope. This in turn means that the database producer is protected against the reproduction, distribution and public communication of the database as a whole or of a substantial part by third parties. EMBL-EBI could also benefit from this protection because Sec. 5 (2) UrhG applies only to works, not to databases as defined in Sec. 87a (1) UrhG. Nothing to the contrary results from the European directives underlying German law.[80]

The database producer‘s right does not protect the individual data or the investment in their production. The extraction and use of individual data, insofar as they do not constitute a substantial part in terms of their nature or scope, is therefore permissible without the granting of rights. In addition, Sec. 87c (1) nos. 2 and 5 UrhG permit the reproduction of a substantial part of the database in terms of type and scope for purposes of scientific research pursuant to Sec. 60c UrhG and for so-called text and data mining for purposes of scientific research pursuant to Sec. 60d UrhG.


5. (German) know-how protection

An AI model, i.e. the totality of the parameters “learned“ during training, is not patentable due to its mathematical-abstract properties. Because these are merely training values that do not constitute a personal intellectual creation, the parameters are not in themselves protected by copyright. If the parameters are part of a database, protected by copyright as described above, the result may be different (but is not apparent here). However, the parameters may be protected as know-how under trade secret law.

According to Sec. 2 no. 1 of the German Act on the Protection of Trade Secrets (GeschGehG)[81], a trade secret is information that is of economic value because it is secret, i.e. it is not generally known or readily accessible, either in its entirety or in the precise arrangement and composition of its components, to persons within the circles that normally handle this type of information, it is the subject of secrecy measures by its rightful owner that are reasonable under the circumstances, and there is a justified interest in keeping it secret.

The totality of the initially unpublished AlphaFold parameters (i.e., the AlphaFold AI model) could be considered information of commercial value because it was not generally known or readily available, either as a whole or in the precise arrangement and composition of its components, to persons within the circles that typically handle this type of information. After publication on GitHub, of course, it was no longer subject of secrecy measures. In principle, however, trade secret law offers a possibility of protection for AI models.

In the case of AlphaFold, according to DeepMind‘s cooperation with the (state-owned) EMBL-EBI, contractual reasons could speak against keeping the secret for the purpose of commercialization. In addition, a balancing of interests could be in favor of third parties who claim a legitimate interest in obtaining and using the AlphaFold parameters, see Sec. 5 GeschGehG. However, the question also arises as to whether keeping the AlphaFold parameters secret would make sense per se. It makes it difficult to verify the published AlphaFold protein structure predictions by repetition and thus to validate them. Transfer of the predictions to other protein structures would remain uncertain. Time-consuming and costly experimental procedures would remain fully necessary. The purpose of AlphaFold to accelerate protein structure prediction in a cost-saving manner would rather be thwarted.


II. Rights regarding AlphaFold protein structure predictions

1. What does an AlphaFold protein structure prediction include?

An AlphaFold protein structure prediction in the AlphaFold DB first includes basic information such as the amino acid sequence about the represented protein. In addition, a structure overview contains three AlphaFold outputs:

  1. The three-dimensional representation of the protein including the side chains (when clicking on the amino acid sequence);
  2. the information conveyed by staining on the confidence of protein structure prediction between values of more than 90 and less than 50 pLDDT; and
  3. the “predicted aligned error“ in a two-dimensional representation, which provides additional (error) information for the interpretation of the protein structure prediction.[82]


2. No rights to basic information

It has already been pointed out that data as such and thus content, facts and information as a necessary prerequisite for science and research are free.[83] They as a basis for creation are in the public domain.[84] That means, basic information accompanying a protein structure prediction, such like the amino acid sequence, is free of copyright.


3. No copyright regarding the graphic representation of the protein structure prediction

In principle, a copyright may exist in the representation of a protein structure (prediction): A representation of a scientific or technical nature within the meaning of Sec. 2 (1) no. 7 UrhG must then, with the expressive means of graphic or plastic representation, serve to convey instructive or informative information about the object represented.[85] The purpose of conveying information distinguishes representations of a scientific or technical nature from works of fine art, which are primarily intended to appeal to the aesthetic sensibility and, as works of applied art, also serve a utilitarian purpose.[86] The means of expression of graphic or plastic representation distinguishes them in turn from linguistic works, whose means of expression is language.[87] Even the representation of the simplest scientific findings can be protected.[88] It is sufficient that the representation expresses an individual intellectual activity that stands out from everyday work in the field concerned, even if the degree of intellectual achievement and individual character is low.[89]

In a graphically expressed three-dimensional protein structure prediction generated by AlphaFold lies first of all a representation of a scientific nature, because it serves to convey instructive information about the represented object: the possible folding of the protein. The fact that the protein structure prediction, like any scientific thesis, requires scientific validation by human experts does not change this.

However, an individual intellectual activity that stands out from the everyday work in the field of bioinformatics or theoretical chemistry does not lie in the graphical representation. Rather, the representation of the calculated protein structure data follows scientific standards, in this case the representation in a two- or three-dimensional coordinate system. Also in the colored identification of the confidence of partial structures no individual intellectual activity is to be recognized; it does not serve the individual expression, but the information on the accuracy of the protein structure prediction.

The examination of the question whether AlphaFold as an AI system can be the originator of the protein structure predictions is unnecessary as a result of the required but missing creative effort.


4. Copyrighted work “predicted aligned error tutorial“

Insofar as a protein structure is described linguistically on the AlphaFold DB structure pages and this description is an expression of an intellectual creation (Sec. 2 (2) UrhG), a copyright can exist in this linguistic work pursuant to Sec. 2 (1) no. 1 UrhG. A hybrid work protected by copyright pursuant to Sec. 2 (1) no. 1 and no. 7 UrhG is assumed here for the “predicted aligned error tutorial“. The explanation in easily understandable steps, illustrated with graphic representations, reveals human thought processes and thus a personal intellectual creation.[90]


5. Practical problems

With regard to the “predicted aligned error tutorial“, there is, as in the case of the Nature language work, a copyright. Accordingly, the Creative Commons Attribution 4.0 (CC-BY 4.0) license granted[91] by DeepMind must be observed.

However, it is problematic that this license expressly covers “all of the data provided“. The data of an AlphaFold DB structure page is, as explained, predominantly not copyrightable and licensable. The license does contain an exception under Notices:

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.“[92]

But the distinction between public domain and copyrighted elements can be difficult under certain circumstances. In addition, the situation is cross-border because the licensor, DeepMind, is based in the UK and thus outside the EU, where copyright law is harmonized, at least to a large extent. In the worst case, the material is not even used due to uncertainty about the legal situation.


D. Science ethics

Researchers are not only bound by law, but also by ethical principles that go beyond the institution‘s own, university and national guidelines.


I. DFG Code of Conduct: Guidelines for Ensuring Good Scientific practice

Although the so-called Leitlinien zur Sicherung guter wissenschaftlicher Praxis[93] (Guidelines for Ensuring Good Scientific Practice) do not have the status of law, they bind all universities and non-university research institutions that wish to receive funding from the German Research Foundation (DFG).[94] The code provides an orientation framework for scientists and for the management of universities and non-university research institutions.[95] According to this code, the principles of good scientific practice include, in particular, working lege artis, maintaining strict honesty with regard to one‘s own and third parties‘ contributions, consistently challenging all results oneself, and permitting and promoting critical discourse in the scientific community.[96] Essential is cross-phase quality assurance according to Guideline 7, which includes acknowledging the origin of data, materials, and software used and citing sources. The source code of publicly available software must be persistent, citable, and documented.[97] It must be possible to replicate or confirm research results and findings; this is the core of scientific quality assurance. According to Guideline 10, scientists take into account existing rights and obligations, especially those that may result from legal requirements, but also from contracts with third parties, and, if necessary, obtain and submit approvals and ethics votes.[98]According to Guideline 13, public access to research results, i.e. to all results for the purpose of scientific discourse, must be established as a matter of principle.[99]


II. Further guidelines of “Good Scientific Practice”

In addition to the DFG Code, there are numerous national and international codes of voluntary commitment. Examples include The European Code of Conduct for Research Integrity from 2017,[100] the Montreal Statement on Research Integrity in Cross-Boundary Research Collaborations from 2013[101] and the Singapore Statement of Research Integrity from 2010.[102]


E. Conclusion: There is no need for an extended IP right system

AlphaFold enables some protein structure predictions within minutes, whereas experimental structure determination could take years. It foreseeably enables leaps in pharmaceutical research and development of new therapies, e.g. for people with diseases due to protein folding disorders.

However, as has been shown, there is only limited protection for AlphaFold as software or data- and database-based AI model as well as for AlphaFold protein structure predictions: Patentability appears to be excluded in the absence of an inventive step. Copyright is granted on the source code, but not on parameters themselves, which make AlphaFold the powerful AI system that it can be, as evidenced by its success at CASP14. Here, at most, trade secret protection comes into consideration. Furthermore, there are hardly any or no rights to the data that make protein structure prediction possible in the first place. The protein structure predictions themselves do not exhibit any intellectual creativity, which is why no copyright can exist in them. Only accompanying descriptions can be protected by copyright. Although AlphaFold is freely accessible to anyone and is intended to stimulate research in the fields between and consisting of biology and chemistry, the licensing concessions on the part of DeepMind, which are expressly intended to cover all data but cannot do so, are irritating. Last but not least, DeepMind’s patent applications appears to be in conflict with the freedom of science and research.

The freedom of science and research is recognizably opposed to an expansion of the IP right system in favor of developers of AI systems and even in favor of AI systems themselves. The expansion also does not appear necessary, since where protective rights such as patent law and copyright are not given in favor of the freedom of science and research, the principles for safeguarding good scientific practice, e.g. the disclosure of sources and source codes, take effect. These principles serve the purpose of scientific discourse – contradictory or simply unclear license grants due to private economic interests, on the other hand, make it more difficult. The same or worse would probably be the consequence of a more complex system of intellectual property rights.

As impressive as the progress in protein structure prediction by calculation is compared to (time-)consuming experimental methods, it is also clear that AlphaFold protein structure predictions should not and cannot completely replace classical methods. After all, these still need to be validated; structure-forming factors that have not been taken into account need to be tested in a supplementary manner. AlphaFold is therefore a tool that supports complex research work. Thus, discussions about AI as an inventor or the new regulations of rights to AI products are unnecessary.

Last but not least, the AlphaFold case shows that significant developmental leaps trigger an attribution desire. DeepMind and its employees desire the designation in any context of use. DeepMind even covets the patenting of the AI system AlphaFold. This raises the question whether there can be any interest at all in the designation of a non-legally capable AI system as inventor,[103] let alone in the creation of an AI legal personality. Rather, this idea seems to be only an expression of legal pioneering spirit.



[1]  Menn, “What took years in the lab now becomes available in minutes,” July 28, 2021, WirtschaftsWoche, (last accessed September 5, 2021). Often this assumption is found (only) in titles, such as Heller, “Artificial Intelligence Solves Protein Puzzle,” December 18, 2020,, Deutschlandfunk (last accessed September 5, 2021).

[2] (last accessed September 5, 2021).

[3] (last accessed September 5, 2021).

[4]  Cf. art. I, para. 2, of the Convention Establishing a European Molecular Biology Laboratory, May 10, 1973, (last accessed Sept. 5, 2021).

[5]  Proteome refers to the totality of all proteins in an organism, cell or tissue.

[6]  CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionize’ medical research,” (last accessed September 5, 2021).

[7] Renneberg/Süßbier/Berkling/Loroch, Biotechnology for beginners, 5th ed., Springer-Verlag GmbH Germany 2018, p. 73. However, the genetic code is missing for the 21st proteinogenic amino acid, selenocysteine. I.e. the DNA sequence does not know any building instructions for this amino acid.        

[8]  Christian B. Afinsen received the Nobel Prize in 1972 for his work on ribonuclease, in particular the link between amino acid series and biologically effective conformations. When people talk about the “protein folding problem that has been unsolved for 50 years,” Afinsen’s work is the connecting factor.

[9]  Cf. protein folding, Wikipedia, (last accessed September 5, 2021); see also Renneberg/Süßbier/Berkling/Loroch, Biotechnology for beginners, 5th ed.

[10]  Friedrich, The Levinthal Paradox, 2017, (last accessed September  5, 2021).

[11] Steigele, Protein Structure and Modeling, BioInf University of Leipzig, 2008, p. 290,, see also (last accessed September 5, 2021).

[12]  Cf. Protein folding, Wikipedia, (last accessed Sept. 5, 2021).

[13]  The so-called Levinthal paradox is thought to explain the (time) complexity of protein folding, see Friedrich, The Levinthal Paradox, 2017, and (last accessed September 5, 2021).

[14]  Cf. Protein folding, Wikipedia, (last accessed Sept. 5, 2021).

[15] (last accessed September 5, 2021).

[16]  Crystal structure analysis is the determination of the atomic structure of a crystal by diffraction of suitable radiation on the crystal lattice, cf. (last accessed September 5, 2021). Probably the best-known example is “Photo 51,” which was crucial to the discovery of the double helix structure of DNA, (last accessed September 5, 2021).

[17]Steigele, Protein Structure and Modeling, BioInf, Uni Leipzig, 2008, (last accessed September 5, 2021).

[18]  Berman et. al., Announcing the worldwide Protein Data Bank, December 1, 2003, available at (last accessed September 5, 2021).

[19]  See in detail (last accessed September 5, 2021).

[20] (last accessed September 5, 2021).

[21]  This is why there is often talk of AlphaFold 2, which is a further development of AlphaFold 1: Jumper et al, Highly accurate protein structure prediction with AlphaFold, Nature (2021), available at (last accessed September 5, 2021).

[22]Jumper et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021), available at (last accessed September 5, 2021).

[23]  Worldwide Protein Data Bank, (last accessed September 5, 2021).

[24] Jumper et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021), available at (last accessed September 5, 2021); (last accessed September 5, 2021).

[25]  CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionize’ medical research,” (last accessed September 5, 2021).

[26]  CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionize’ medical research,” (last accessed September 5, 2021).

[27]  14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP), which is held every two years to assess the state of the art. Amino acid sequences are presented whose structure has been solved but not yet created. Participants would have to work them out “blindly.” CASP14 took place in 2020.

[28]  CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionize’ medical research,” (last accessed September 5, 2021).

[29]  CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical , (last accessed September 5, 2021).

[30]  Same result Menn, WiWo, Interview with Andrei Lupas, Max Plack Institute for Developmental Biology in Tübingen, “What took years in the lab is now available in minutes,” July 28, 2021, (last accessed September 5, 2021).

[31] Schroeder, Statement on CASP14 Outcome, (last accessed September 5, 2021).

[32] Schroeder, Statement on CASP14 Outcome, (last accessed September 5, 2021).

[33] Schroeder, Statement on CASP14 Outcome, (last accessed September 5, 2021).

[34] (last accessed September 5, 2021).

[35]  Cf. Ethics declaration in Jumper et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021), available at (last accessed September 5, 2021).

[36]  Ref: US2021166779A1; WO2021110730A1.

[37]  Multiple sequence alignment (MSA) here refers to the methodical comparison of multiple amino acid sequences in linear order, e.g., to identify similar functions as a result of similarities in structures. Read more: (last accessed September 5, 2021).

[38] (last accessed September 5, 2021).

[39]  EPO, Case Law of the Boards of Appeal, I.D.9.1.1, T 154/04, OJ 2008, 46.

[40] (last accessed September 5, 2021).

[41]  Wiebe in Leupold/Wiebe/Glossner, IT-Recht, 4th ed. 2021, C.H. Beck, p. 1028, para. 4.

[42]  Cf. EPO, Test Guidelines, G-II 3.3.1, Wiebe in Leupold/Wiebe/Glossner, IT-Recht, 4th ed. 2021, C.H. Beck, p. 1028, para 4.

[43]  See claims 10 and 11, accessed September 5, 2021).

[44]  EPO, Guidelines for Testing in Cases Where All Procedural Steps Can Be Performed Entirely by General Data Processing Means (3.9.1), (last accessed September 5, 2021).

[45]  EPO, Guidelines for Testing in Cases Where All Procedural Steps Can Be Performed Entirely by General Data Processing Means (3.9.1), (last accessed September 5, 2021).

[46] (last accessed September 5, 2021).

[47]  See CASP14 Press Release, November 30, 2020, “Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical research,” (last accessed September 5, 2021).

[48]  The Folding@Home (Stanford University), Human Proteome Folding Project (New York University) and Rosetta@Home (BakerLab, University of Washington) projects should also be mentioned here, for example.

[49]  For more detail, Karim, Illustrated: Self-Attention, November 18, 2019 in toward data science, (last accessed September 5, 2021).

[50]  WO2020058176 – Machine Learning For Determining Protein Structures, Written Opinion of the International Searching Authority under PCT Rule 43 bis. 1, dated March 26, 2020, (last accessed September 5, 2021).

[51], (each last accessed September 5, 2021).

[52] (last accessed September 5, 2021).

[53]  In the following, the international scope of application of the German Copyright Act is not examined or its applicability is assumed.

[54] (last accessed September 5, 2021).

[55]  Herold/Lurz/Wohlrab/Hopf, Fundamentals of Computer Science, 3rd ed.

[56] Herold/Lurz/Wohlrab/Hopf, Fundamentals of Computer Science, 3rd ed.

[57]  Herold/Lurz/Wohlrab/Hopf, Fundamentals of Computer Science, 3rd ed.

[58]  Herold/Lurz/Wohlrab/Hopf, Fundamentals of Computer Science, 3rd ed.

[59] Herold/Lurz/Wohlrab/Hopf, Fundamentals of Computer Science, 3rd ed.

[60]  Jumper et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021), available at (last accessed September 5, 2021).

[61] (last accessed September 5, 2021).

[62] (last accessed September 5, 2021).

[63] (last accessed September 5, 2021).

[64] (last accessed September 5, 2021).

[65] (last accessed September 5, 2021).

[66] (last accessed September 5, 2021).

[67] (last accessed September 5, 2021).

[68]  This applies in particular to the freedom of property rights to data.

[69]  Cf. Kreutzer/Lahmann, Legal Issues in Open Science, 2nd ed., Verlag Hamburg University Press, Verlag der Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, Hamburg (Germany), 2021, pp. 31, 51.

[70]Scheufen in Leupold/Wiebe/Glossner, IT-Recht, 4th ed. 2021, C.H. Beck, p. 1034, marginal no. 6; Dreier/Schulze, UrhG, § 2 marginal no. 8.

[71]Kreutzer/Lahmann, Legal Issues in Open Science, 2nd ed., Verlag Hamburg University Press, Verlag der Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, Hamburg (Germany), 2021, p. 52.

[72]  Cf. Kreutzer/Lahmann, Legal Issues in Open Science, 2nd ed., Verlag Hamburg University Press, Verlag der Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, Hamburg (Germany), 2021, p. 51.

[73] (last accessed December 30, 2021).

[74] (last accessed September 5, 2021).

[75]  See, e.g.,, „Use of the AlphaFold Protein Structure Database is subject to the EMBL-EBI Terms of Use.“ (last accessed September 5, 2021).

[76]  AlphaFold is used to predict protein structures, not to create databases. References to a creation of the AlphaFold DB by an AI system are missing.

[77]  Cf. Art. 9 Headquarters Agreement between the Government of the Federal Republic of Germany and the European Molecular Biology Laboratory of December 10, 1974, BGBl. 1975, Part II, 933 ff. and Convention on the Establishment of a European Molecular Biology Laboratory of May 10, 1973, (last accessed September 5, 2021).

[78] However, Section 5 (2) UrhG stipulates that provisions on the prohibition of alteration and the indication of the source are to be applied accordingly.

[79]  E.g., (last accessed September 5, 2021).

[80]  In its decision of September 28, 2006 (Case No. I ZR 261/03), the BGH referred the following question to the ECJ for a preliminary ruling: „Do Articles 7(1) and (5), Article 9 of Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases preclude a rule in a Member State under which an official database published in the official interest for general information (here: a systematic and complete collection of all tender documents from a German federal state) does not enjoy sui generis protection within the meaning of the Directive?“, (last accessed September 5, 2021). A subsequent decision is unknown.

[81]  Which implements the so-called Secrecy Directive, Directive (EU) 2016/943 of the European Parliament and of the Council of 8 June 2016 on the protection of confidential know-how and confidential business information (trade secrets) against unlawful acquisition and unlawful use and disclosure.

[82]  E.g., (last accessed Sept. 5, 2021), compared to (last accessed September 5, 2021).

[83]  Cf. Kreutzer/Lahmann, Legal Issues in Open Science, 2nd ed., Verlag Hamburg University Press, Verlag der Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, Hamburg (Germany), 2021, pp. 31, 51.

[84]  Cf. Kreutzer/Lahmann, Legal Issues in Open Science, 2nd ed., Verlag Hamburg University Press, Verlag der Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, Hamburg (Germany), 2021, p. 31.

[85]  BGH, judgment of June 1, 2011, Case No. I ZR 140/09 – Learning games, with reference to OLG München, GRUR 1992, 510; KG, GRUR-RR 2002, 91, 92.

[86]  BGH, judgment of June 1, 2011, ref. no. I ZR 140/09.

[87]  BGH, judgment of June 1, 2011, Ref. I ZR 140/09 with reference to Loewenheim in Schricker/Loewenheim, Urheberrecht, 4th ed.

[88]  BGH, judgment of June 1, 2011, Ref. I ZR 140/09 with reference to Wandtke/Bullinger, Urheberrecht, 3rd ed., § 2 UrhG marginal no. 132; Loewenheim in Schricker/Loewenheim aaO § 2 UrhG marginal no. 197.

[89]  BGH, Judgment of June 1, 2011, Case No. I ZR 140/09, with reference to established case law. Rspr., BGH, judgment of November 20, 1986, Ref. I ZR 160/84, GRUR 1987, 360, 361 – Advertising plans; judgment of February 28, 1991 – I ZR 88/89, GRUR 1991, 529, 530 – Explosion drawings; further references inncluded.

[90]  See, e.g., at (last accessed September 5, 2021).

[91]  See, e.g., at (last accessed September 5, 2021).

[92] (last accessed September 5, 2021).

[93] (last accessed September 5, 2021).

[94] (last accessed September 5, 2021).

[95] (last accessed September 5, 2021).

[96]  Explanatory Note to Guideline 1, (last accessed Sept. 5, 2021).

[97]  Explanatory Note to Guideline 7, (last accessed Sept. 5, 2021).

[98]  Excerpt Guideline 10, (last accessed September 5, 2021).

[99]  Excerpt Guideline 13, (last accessed September 5, 2021).

[100]  Available at (last accessed September 5, 2021).

[101]  Available at (last accessed September 5, 2021).

[102]  Available at (last accessed September 5, 2021).

[103]  Cf. the DABUS case: in 2020, the European Patent Office rejected two patent applications in which an AI system was named as inventor, (last accessed Sept. 9, 2021). The Federal Court of Australia, on the other hand, in its decision in Thaler v Commissioner of Patents [2021] FCA 879, considered it possible – in the absence of rules such as those of the European Patent Convention (EPC) on inventorship – that the AI system DABUS could be an inventor, (last accessed September 9, 2021).


Titelbild: © Christoph Burgstedt, via Adobe Stock, #349211887

You cannot copy content of this page