"Web scraping,’’ also known as ‘‘web data extraction’’ or ‘‘web harvesting,’’ is the process of extracting data from websites using automated software solutions, known as ‘‘bots’’ or ‘‘spiders.’’ According to Distil Network’s Economics of Web Scraping Report, web scraping activity is a prevalent practice, generating up to 46% of all web traffic. Approximately 38% of web scrapers use this technology to obtain content, primarily targeting websites directed to real estate, digital publishing, travel, online directories, e-commerce, marketplace, and classifieds.
The value of web scraping is based on an informal quid pro quo between website owners, web scrapers, and web users. Aggregation websites, such as hotel booking and ticket-selling websites, offer their users the ability to leverage the disparate resources available on the internet. By employing web scraping techniques, aggregation websites extract information from various websites (including government websites) and consolidate that information into a single place for their patrons’ ease-of-use. This collection of information drives traffic to the aggregator, potentially increasing its advertising revenue, brand recognition, and user-generated fees. In exchange for the data, aggregation websites often send traffic to the scraped website itself, thereby increasing that website’s audience and potential revenue.
Despite benefits such as increased traffic and revenue, some website owners find web scraping ultimately harmful to their carefully-crafted internet presence. Web scrapers may infringe a website owner’s copyrights or trademarks, which can spur legal challenges and damage the website’s brand. Web scraping can also slow down a website owner’s servers and increase webpage load times, negatively impacting user experience and the website’s revenue stream. Consequently, it is common for website owners to prohibit scraping in their terms of service and sue web scrapers on, among other claims, violations of the Federal Computer Fraud and Abuse Act (CFAA) and analogous state claims. In response, web scrapers typically counterclaim alleging violations of antitrust and unfair competition laws.
To address the growing problem of computer hacking, in 1984 Congress passed the Computer Fraud and Abuse Act, creating criminal and civil liability for a party who accesses a computer without authorization or in a manner exceeding their authorization. To prevail on a civil CFAA claim, a plaintiff must demonstrate that a defendant ‘‘intentionally accesse[d] a computer without authorization or exceed[ed] authorized access, and thereby obtain[ed]. . .information from any protected computer;’’ or that the defendant ‘‘knowingly cause[d] the transmission of a program . . . and . . . cause[d] damage without authorization to a protected computer.’’ 18 U.S.C. § § 1030(a)(2)(C), 1030(a)(5)(A) (2008). To proceed on a civil claim under the CFAA, a plaintiff must also allege, as a threshold matter, that the defendant’sunauthorized access caused at least $5,000 in loss or damage during a one-year period. 18 U.S.C. § 1030(c)(4)(A)(i)(I) (2008). While courts have typically applied the CFAA in manner that broadly protects a website’s publicly-available data against third-party web scrapers, courts have also articulated various standards to determine whether a web scraper accessed a website without authorization or exceeded authorized access in violation of the CFAA.
In 2017, however, the District Court for the Northern District of California, in a ruling favorable to web scrapers, addressed the applicability of the CFAA to web scraping activities to publicly available information, thereby adding further uncertainty into website owners’ ability to seek recourse under the CFAA against web scrapers. In hiQ Labs, Inc. v. LinkedIn Corp., the Northern District of California granted hiQ’s motion for preliminary injunction prohibiting LinkedIn from using electronic blocking techniques to prevent hiQ from scraping information from public LinkedIn profiles. See hiQ Labs, Inc. v. LinkedIn Corp., 273 F. Supp.3d 1099 (N.D. Cal. 2017). The court ruled that the injunction favoring hiQ was proper because, among other considerations, hiQ ‘‘raised serious questions as to applicability of the CFAA to its [web scraping of LinkedIn’s public profile information],’’ namely that the CFAA was not enacted to prevent access to publicly viewable data not protected by an authentication gateway. Id. at 1113– 1114. The court reasoned that its ruling was true to the legislative intent of the CFAA, stating that the application of the CFAA to publicly available website content ‘‘would have sweeping consequences well beyond anything Congress could have contemplated; it would ‘expand its scope well beyond computer hacking.’ ’’ Id. at 1110. In distinguishing hiQ’s web scraping activities from that conducted in the Facebook case, the Court noted that the Facebook defendants scraped private data protected by authorization techniques (e.g., password protection or paywall), whereas hiQ accessed and scraped only public data that was left unprotected. Id. at 1109. The court therefore ruled that LinkedIn could not limit hiQ’s access to LinkedIn’s public profiles or any content open to the public under the CFAA because hiQ did not ‘‘access [LinkedIn’s servers] ‘without authorization’, even in the face of technical countermeasures, when the data it accesses is otherwise open to the public.’’ Id. at 1113. LinkedIn has since appealed the ruling to the U.S. Court of Appeals for the Ninth Circuit.
If the Ninth Circuit affirms the decision in hiQ Labs v. LinkedIn Corp., website owners may be potentially precluded from bringing claims under the CFAA against web scrapers that mine publicly available data on the internet. Until courts resolve these legal issues, website owners should instead consider relying on more stringent authorization standards and defensive technology to hamstring web scraping activities.
Proponents of web scraping often express concern that decisions permitting website owners to prohibit and seek remedies for web scraping under the CFAA are anti-competitive. For example, in response to the Craigslist v. 3Taps decision, Professor Eric Goldman of the Santa Clara University School of Law stated that the ‘‘ruling is not onlybad for consumers, but is bad for Internet Law—in the sense that Craigslist is creating legal precedent that other websites can use in the future for anticompetitive/ anti-consumer purposes.’’ See Eric Goldman, Craigslist Anti-Consumer Lawsuit Threatens to Break Internet Law, FORBES (May 23, 2013, 11:50 AM). Yet, in many cases, earlier courts consistently denied web scrapers’ antitrust and unfair competition claims. In parting from its predecessors, the hiQ court issued a ruling favorable to web scrapers and their proponents on this issue. HiQ argued that LinkedIn blocked its access to member data to monetize the data for itself with a competing product, constituting ‘‘unfair’’ competition under California’s Unfair Competition Law (‘‘UCL’’), Cal. Bus. & Prof. Code § 17200 et seq. The court agreed with hiQ’s contention that LinkedIn’s conduct ‘‘violate[d] the spirit of the antitrust laws’’ (and was therefore anti-competitive) in two ways: first, LinkedIn was leveraging its dominance in online professional networking for an uncompetitive advantage against hiQ in the data analytics market; second, LinkedIn’s conduct violated the ‘‘essential facilities’’ doctrine by precluding access to its member data, which is the lifeblood of hiQ’s business. hiQ Labs Inc., 273 F. Supp.3d at 1117. Moreover, the district court in hiQ was not persuaded by LinkedIn’s argument that it acted primarily out of concern for member privacy and not for exclusive control over the data collected from its members, Id. at 1118, finding that LinkedIn’s practice of making user data available to other third parties undermined this argument. Id. In so holding, hiQ illumines another path by which web scrapers may potentially challenge website owners’ data access restrictions. This issue is also before the Ninth Circuit.
Recent decisions could signal a shift in web scrapers’ potential liability under the CFAA. Providing notice and implementing IP address blocking techniques against web scrapers may no longer prove to be successful tactics for a website owner to implement as a way to restrict web scrapers’ access to publicly available information. Moreover, claims brought against web scrapers under the CFAA may potentially open website owners to liability under unfair competition laws. This risk may be particularly perilous for websites collecting and maintaining troves of data on a large user base. Web scraping companies should, however, tread cautiously, following a potential roadmap from recent caselaw to shield themselves from CFAA liability. While these decisions are pending on appeal, they imbue uncertainty in the current legal landscape and leave the current symbiosis between website owners and web scrapers in limbo.
February 28, 2018
April 22, 2018
October 24, 2018
European IP Blog
13 September 2018
September 26, 2018
October 31, 2018
Prosecution First Blog
September 13, 2018
September 13, 2018
August 30, 2017