The Internet’s Leading Archiving Tool Faces Threats

This month, USA Today released a compelling report highlighting how US Immigrations and Customs Enforcement delayed revealing crucial information regarding the effects of its detainment policies. The authors utilized the Internet Archive’s Wayback Machine to gather and analyze detention data from ICE, tracking how the agency evolved during the Trump administration. This narrative exemplifies the Wayback Machine’s invaluable role in preserving information for the public good, a fact that Wayback Machine director Mark Graham finds “a little ironic.”
USA Today Co., the publishing conglomerate previously known as Gannet, which operates its flagship publication along with over 200 other media outlets, prevents the Wayback Machine from archiving its content. “They’re able to compile their story research thanks to the Wayback Machine’s existence. Yet, at the same time, they are blocking access,” Graham notes.
Several major journalism organizations, including The New York Times, have also recently taken steps to restrict the Wayback Machine from archiving their stories. According to an analysis by the artificial-intelligence-detection startup Originality AI, 23 significant news sites are currently blocking ia_archiverbot, the web crawler typically used by the Internet Archive for the Wayback project. The social platform Reddit is among them. Other outlets, like The Guardian, limit the project differently: they do not block the crawler but exclude their content from the Internet Archive API and filter out articles from the Wayback Machine interface, complicating access for everyday users.
A spokesperson for USA Today Co., Lark-Marie Anton, emphasized that “this effort is not specifically aimed at blocking the Internet Archive” but rather part of the company’s broader strategy to restrict all scraping bots. Robert Hahn, the Guardian’s director of business affairs and licensing, mentioned ongoing discussions with the Archive regarding “concerns about the potential misuse of content sets crawled for preservation purposes” by AI companies.
In response to this trend, individual reporters are voicing their opposition. This week, advocacy groups such as the Electronic Frontier Foundation and Fight for the Future mobilized journalists in support of the Wayback Machine. The coalition gathered over 100 signatures from active journalists who recognize the tool’s importance, presenting a letter of support to the Internet Archive. Signatories include prominent figures like television host Rachel Maddow and independent journalists such as Kat Tenbarge from Spitfire News and Taylor Lorenz from User Mag. “In earlier times, journalists would refer to the physical archives of local newspapers or public libraries to access historical reporting and trace current events back to their origins,” the letter asserts. “With many newspapers shuttered and no clear path for local libraries to maintain digital-only reporting, the responsibility of preserving journalism’s history increasingly rests with the Internet Archive.”
Laura Flynn, a signatory and supervising podcast producer at The Intercept, remarked that the Internet Archive has been an “essential tool” throughout her career, playing a crucial role in fact-checking and surfacing audio clips. Another signatory, Micco Caporale, a writer for the Chicago Reader, stated that the Wayback Machine is particularly helpful in researching older bands and cultural figures, providing access to fan sites that would otherwise be lost.
Caporale also shared that the tool has been beneficial in their work as a union organizer. “I’ve been using the Wayback Machine extensively in my union organizing efforts to find old job postings, allowing us to compare what the company claimed they were hiring for against the actual duties assigned, and to track how positions have evolved over time,” Caporale explained. “These archived posts help us monitor pay variations across the organization over the years.”
