Wikilegal/Copyright Status of Wikipedia Page Histories



Previous versions of articles on Wikipedia may be viewed in page histories by clicking the current article’s “View history” tab.  These historical versions of articles may contain copyrighted content, and therefore may constitute copyright infringement.  Although the Wikipedia community does not have a policy of always deleting copyright problems from page histories, the Wikimedia Foundation may remove violating page versions if it receives a valid DMCA takedown notice.[1]

Section 108 of the Copyright Act[2] allows libraries and archives to reproduce and distribute copyrighted works under certain circumstances. However, this statute likely does not apply to archives with only a digital presence, such as Wikipedia page histories, and therefore probably does not exempt the use of copyrighted material on Wikipedia from infringement liability. Fair use may be invoked as a defense in the event of copyright infringement in Wikipedia page histories, but the judicial outcome of asserting fair use is generally unpredictable.  

17 U.S.C. § 108 & Wikipedia Page Histories


The library exception of section 108 of the Copyright Act states:

“ is not infringement of copyright for a library or reproduce no more than one copy...of a work...or to distribute such copy...if the reproduction or distribution is made without any purpose of direct or indirect commercial advantage; the collection of the library or archives are open to the public...; and the reproduction or distribution of the work includes a notice of copyright that appears on the copy...that is reproduced under the provisions of this section, or includes a legend stating that the work may be protected by copyright if no such notice can be found on the copy or phonorecord that is reproduced under the provisions of this section.”[3]

Digital archives, such as Wikipedia page histories, do not meet section 108’s library exception.  Legislative history indicates that Congress did not intend for the exception to apply to archives existing entirely on the Internet, without physical premises.[4]  In Senate Report No. 105-190, the Senate Judiciary Committee stated, “[a]lthough online interactive digital networks have since given birth to online digital ‘libraries’  and ‘archives’ that exist only in the visual (rather than physical) sense on websites...across the Internet, it is not the Committee’s intent that [17 U.S.C § 108] apply to such collections of information.”[5] The report goes on to explain that applying section 108 to archives existing solely on the Internet would lead to a slippery slope of copyright infringement, whereby anyone with a website could freely violate the copyright owner’s exclusive rights of reproduction, distribution, and display and claim protection under the section 108 exception.[6]

Therefore, though Wikipedia does embody Congress’ intent with 17 U.S.C. § 108 in the sense that it facilitates access to knowledge by the public without the purpose of commercial advantage (as a traditional library or archive does), it is likely not covered by section 108’s library exception because Congress explicitly intended to limit its application to libraries and archives with a physical premise in order to preclude abuse of the exception.

Fair Use as a Potential Defense


“Plaintiffs must satisfy two requirements to present a prima facie case of direct infringement: (1) they must show ownership of the allegedly infringing material, and (2) they must demonstrate that the alleged infringers violate at least one exclusive right granted to copyright holders under 17 U.S.C. § 106.”[7]  Therefore, a plaintiff showing valid copyright ownership in the content retained in a Wikipedia page history and establishing that the Wikimedia Foundation or Wikipedia users who originally uploaded the allegedly infringing material infringed its exclusive rights to reproduce the copyrighted work, distribute the copyrighted work to the public, or display the copyrighted work publicly[8] may have a prima facie case for direct copyright infringement.  Archiving page histories containing copyright-infringing materials may constitute a violation of each of these exclusive rights.  First, archiving by definition requires making a copy of content to keep in a history, and therefore may violate the copyright owner's exclusive right to reproduce the work.  In addition, archiving may violate the exclusive rights of public distribution and display because archiving makes the copyrighted content readily available on the web for the public to view and use.

Fair use is a defense through which copyright infringement may be excused even when there is no doubt that infringement occurred.  Section 107 of the Copyright Act allows for fair use to be invoked in cases where infringement occurred for purposes “such as criticism, comment, news reporting, teaching, scholarship, or research.”[9]  The statute includes a nonexhaustive list of fair use factors to be considered, including “the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; the nature of the copyrighted work; the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and the effect of the use upon the potential market for or value of the copyrighted work.”  Field v. Google, a district court case involving Google’s cached feature through which searchers can view a previous version of a Web page if the current version is unavailable, added consideration of good faith to its fair use analysis.[10]

Generally speaking though, a use will be considered fair if it promotes the overarching purposes of copyright law and does not undermine the copyright owner’s market opportunities with the work.[11]  However, because fair use analysis is a factual determination, its outcome and effect on a particular set of circumstances is unpredictable.[12]

The purpose and character of the use


In determining the purpose and character of the defendant’s use, courts may consider (1) whether the purpose of the use is, or is similar to, a fair use purpose enumerated in 17 U.S.C. § 107; (2) whether the use is commercial or noncommercial; and (3) whether the use is transformative.  

First, 17 U.S.C. § 107 states that uses for “purposes such as criticism, comment, news reporting, teaching, scholarship, or research” may be fair.[13]  Page histories may serve a broad educational purpose as records of the past, illustrating the development of an article, or as useful documentation for research or scholarship on Wikipedia’s history.  In addition, depending on the subject of the archive containing the copyrighted content, such use may be in the public interest.[14]

Secondly, in determining whether the use is commercial or noncommercial, a court may focus the inquiry on “whether the original was copied in good faith to benefit the public or primarily for the commercial interests of the infringer.”[15]  A noncommercial use generally weighs in favor of a finding of fair use.  Because the Wikimedia Foundation is a non-profit organization and Wikipedia is a free encyclopedia, a court is likely to find that the use of copyrighted content in page histories is noncommercial, as there is no profit gained from Wikipedia page histories.   

Thirdly, courts will analyze whether the use is transformative, meaning it “adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message…”.[16]  In A.V. ex rel. Vanderhye v. iParadigms, LLC, the Fourth Circuit found that a digital archive of high school student papers used to detect plagiarism did not infringe students’ copyright by storing their papers without permission.[17]  The court stated, “[t]he use of a copyrighted work need not alter or augment the work to be transformative in nature.  Rather, it can be transformative in function or purpose without altering or actually adding to the original work.”[18]  Despite the fact that there was no substantive alteration to the stored works, the court held that archiving papers to compare and detect plagiarism was a transformative use, as it is distinct from the works’ intended expressive purpose.[19]  Depending on the nature of the copyrighted content and its intended purpose, a court may hold that retaining it in Wikipedia page histories constitutes a transformative repurposing of the content.  First, copyrighted content may be retained in a Wikipedia page history to preserve the development of an article.  Second, retaining copyrighted works in page histories serves a preventive function, in that doing so informs contributors of potential copyright infringement and precludes the restoration of such copyrighted content in the current version of the article.  Because these two purposes are distinct from a copyrighted work’s original expressive purpose, the retention of copyrighted content in Wikipedia page histories may be considered a transformative use.

The nature of the copyrighted work


Inquiry into the nature of the copyrighted work requires a case-by-case analysis, which takes into consideration whether the original, allegedly infringed work is published or unpublished and whether the work is creative or factual.[20]  In most cases, the copyrighted materials contained in Wikipedia page histories were published prior to their appearance on Wikipedia, as contributors to Wikipedia generally use secondary sources that are readily accessible.  Whether the work is creative or factual is largely dependent on the specific content, as both are widely used in Wikipedia articles to illustrate concepts within an article’s topic.

The amount and substantiality of the portion used in relation to the copyrighted work as a whole


The factor weighing the amount and substantiality of the portion used in relation to the copyrighted work as a whole will depend on the purpose and character of the use at issue and is thus a case-by-case determination.[21]  Here, the analysis will be contingent on what percentage of the copyrighted work is reproduced in the page history, the importance of the portion reproduced as compared to the entire work, and whether any less of the work could have been used to fulfill the need for the use.  For example, a finding that the retention of the entire copyrighted work in the page history is necessary to preserve the record of how that article was developed would weigh in favor of fair use.  On the other hand, if it would have been sufficient to describe the copyrighted image that was removed to preserve the history of its inclusion in an older version of the article, but the actual copyrighted image was kept in the page history instead, this factor may weigh against a finding of fair use.

The effect of the use upon the potential market for or value of the copyrighted work


In determining the effect of the use upon the potential market for or value of the copyrighted work, courts hone in on whether the use inhibits the author’s incentive to create.  In Sony Corp. of America v. Universal City Studios, Inc., the Supreme Court stated that “[a] use that has no demonstrable effect upon the potential market for, or the value of, the copyrighted work need not be prohibited in order to protect the author’s incentive to create...The prohibition of such noncommercial uses would merely inhibit access to ideas without any countervailing benefit.”[22]  Here, the author’s incentive to create may be inhibited if the copyrighted content in Wikipedia page histories is readily downloadable, as such effect may displace the market for these works.  This factor may favor a plaintiff able to show actual damages such as lost sales, or that individuals searching the Web for their content are more likely to explore the content via a Wikipedia page history, as opposed to the content’s original location.  However, the fact that the copyrighted content has been taken out of the main Wikipedia article and is only retained in an archive may mitigate the effect on the potential market for or value of the work.  

Additional consideration: Whether the defendant was acting in good faith (Field v. Google)


Section 107 of the Copyright Act authorizes courts to consider other factors that it deems relevant to fair use analysis.  Citing the Ninth Circuit case Fisher v. Dees,[23] the court in Field v. Google considered whether Google “acted in good faith in providing ‘Cached’ links to Web pages…”.[24]  In doing so, the court there held that Google had acted in good faith as it allowed web page owners to opt-out of caching.[25]  While again, this would be a case-by-case determination, the Wikimedia Foundation may be able to make a showing of good faith by highlighting the fact that the retention of copyrighted content in Wikipedia page histories is for historical purposes and functions as a measure for preventing future copyright infringement.  In addition, the fact that the Wikimedia Foundation complies with all valid DMCA takedown notices may also demonstrate good faith, as it shows that rightsholders have a legitimate, feasible option for removing copyrighted content from page histories.

Potential Risk of Liability for Original Posters of Allegedly Copyrighted Materials


The fair use defense, while available, does not guarantee a positive outcome in the case of web history archives. If the defense fails, the party violating the copyright owner's exclusive rights to reproduction, distribution, or display may be liable for direct copyright infringement.[26] In the case of Wikipedia page histories, the potentially liable party would be the user who uploaded the copyrighted work, as that initial upload is what allegedly violates the copyright owner's exclusive rights. Additionally, downstream re-users of the work may risk liability if their own use of the work is not permissible under fair use or another defense.

Even if direct infringement is found, however, it is unlikely that the editor and administrator flagging and removing copyright issues in the current article, but not necessarily in previous versions of the article, will be held liable for contributory infringement. Under contributory infringement, one who, (1) with knowledge, (2) induces, causes, or materially contributes to (3) copyright infringement, (4) but does not commit or participate in the infringing act may be held liable contributorily (5) if he or she had knowledge, or reason to know, of the infringement.[27] The editor and administrator flagging and removing a copyright issue in a Wikipedia article are not likely to be held responsible for contributory infringement because they are likely not engaged in inducing, causing, or materially contributing to the infringement during the general course of removing or flagging content. Rather, in this capacity, the editor and administrator are working towards taking down instances of possible infringement they see and therefore are not actually contributing to the infringing activity.

In the context of the Mediawiki software it may be significant that each version of the page has a unique identifier, so that someone removing a "copyvio" is not "creating an archive" as suggested above, but actually creating a new page version (at that time the current version) intended to be copyright free.

Associated questions


Note: "copyvio" here refers to material which is considered against a set of rules: these could be the community's (generally more strict) rules, the USC, laws of other relevant jurisdictions, with our without appropriate case law, or a hypothetical set of rules covering the outcome of putative cases.

  1. The question of copyright violation when a "copyvio" on a talk-page is archived.
  2. The question of copyright violation when a mirror or fork is made
  3. The question of copyright violation when a back-up or dump, or a copy of a back-up or dump is made

Note that a key question relevant to these matters is the question of financial loss to the aggrieved party. It would seem unlikely that any of these actions, with the possible exception of creating a mirror or fork, would be capable of sustaining a claim of financial loss.


  1. See Wikimedia Foundation's Terms of Use, Section 8.
  2. 17 U.S.C. § 108.
  3. 17 U.S.C. § 108.
  4. See S.Rep.No 105-190 at 62, which expressly limits the section 108 exemption to “institutions that operate through a physical premise.”
  5. See S.Rep.No 105-190 at 62
  6. See S.Rep.No 105-190 at 62.
  7. A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004, 1013 (9th Cir. 2001).
  8. 17 U.S.C. § 106.
  9. 17 U.S.C. § 107.
  10. Field v. Google, 412 F. Supp. 2d 1106, 1122 (D. Nev. 2006).
  11. Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 568 (1985).
  12. Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. at 560.
  13. 17 U.S.C. § 107.
  14. See Online Policy Group v. Diebold, 337 F. Supp. 2d 1195, 1203 (N.D. Cal. 2004), where the district court observed that posting an email archive about security problems with electronic voting software was fair use because the subject of the discussion was in the public interest.
  15. American Geophysical Union v. Texaco Inc., 60 F.3d 913, 922 (2nd Cir. 1994).
  16. Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994).
  17. A.V. ex rel Vanderhye v. iParadigms LLC, 562 F.3d 630, 634 (4th Cir. 2009).
  18. Id. at 639.
  19. Id.
  20. Harper & Row, Publishers, Inc., 471 U.S. at 563-64.
  21. See Campbell, 510 U.S. at 586-87 (“...the enquiry will harken back to the first of the statutory factors, for, as in prior cases, we recognize that the extent of permissible copying varies with the purpose and character of the use.”)
  22. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417, 450 (1984).
  23. See Fisher v. Dees, 794 F.2d 432, 436-37 (9th Cir. 1986). (“Because ‘fair use’ presupposes ‘good faith’ and ‘fair dealing,’ courts may weigh the ‘propriety of the defendant's conduct’ in the equitable balance of a fair use determination.”) (internal citations omitted).
  24. Field, 412 F. Supp. 2d at 1122.  
  25. Id.
  26. 17 U.S.C. § 501.
  27. Gershwin Pub. Corp. v. Columbia Artists Management, Inc., 443 F.2d 1159, 1162 (2nd Cir. 1971).