Konbini - Toutes les pages Internet sont stockés à la BNF
The French National Library (BNF) has been archiving the French web since the early days of the internet to preserve digital heritage. This includes 60 billion web pages, early forums, blogs, and social media sites like Facebook and Twitter. The BNF's mission is to maintain a record of what was shared, read, and heard online, ensuring that even seemingly trivial content is preserved for future research. The library conducts both broad and targeted collections, such as archiving content related to significant events like the trial of Nicolas Sarkozy. The archived content is stored securely to withstand disasters and is accessible only at the BNF or certain regional libraries. This initiative ensures that future generations can access and study the digital history of France, including obsolete technologies like Flash.
Key Points:
- The BNF archives 60 billion web pages to preserve France's digital heritage.
- Content includes early blogs, forums, and social media, ensuring a comprehensive digital record.
- The library conducts broad annual collections and targeted collections for significant events.
- Archived content is stored securely and accessible only at the BNF or specific libraries.
- The initiative preserves even obsolete technologies, ensuring future access to digital history.
Details:
1. 📚 Skyblogs and the Quest for Digital Heritage
- A total of 12.6 million Skyblogs are archived, providing a comprehensive digital collection that serves as a crucial resource for understanding early internet culture.
- The archive contains 60 billion web pages, including early forums and blogs, capturing the digital heritage and offering insights into the evolution of online communication.
- This extensive archive aids in preserving digital history and offers a foundation for future research into digital communication trends.
- By archiving Skyblogs, significant cultural and historical data is safeguarded, contributing to the study of digital heritage and its development over time.
2. 🌐 BNF's Mission to Archive the French Web
- The Bibliothèque nationale de France (BNF) has been committed to archiving content from the French web since the advent of the internet, capturing a wide range of digital media.
- BNF's archival efforts include early websites, forums, blogs, and media sites, ensuring that diverse digital interactions are preserved for future research and reference.
- The scope of the BNF's digital preservation includes popular platforms such as Facebook, Twitter, and TikTok, highlighting their comprehensive approach to capturing contemporary digital culture.
- BNF employs sophisticated web-crawling technology to systematically archive public digital content, overcoming challenges like the vast and dynamic nature of the internet.
- The significance of this endeavor lies in preserving a digital cultural heritage that can inform future generations about societal changes and trends.
- BNF faces challenges such as the ever-increasing volume of data and evolving digital platforms, requiring continuous adaptation and innovation in their archiving techniques.
- The impact of BNF's efforts is substantial, providing researchers, historians, and the public with access to a rich repository of digital history.
- Future plans include expanding their digital archive capabilities to accommodate emerging technologies and new forms of digital communication.
3. 🕰️ The Internet Archive: A Digital Time Machine
- The Internet Archive serves as a digital time machine, preserving web history from 1996 to the present, aligning with legal deposit requirements to ensure comprehensive archiving.
- Early web pages captured by the Archive were primarily informational, designed to communicate company activities.
- A notable example is the first capture of comini.com on September 26, 2009, featuring a simple blinking logo GIF, showcasing early web design simplicity.
- By December 1, 2014, comini.com evolved to feature 'Combini All Pop Everything', indicating a significant expansion in content offerings, reflecting broader trends in web content evolution.
- The Archive's mission is crucial for digital preservation, offering strategic insights into the historical progression of web design and content strategy.
4. 💾 Preserving Skyblogs for Posterity
- The team from Skyblog informed that 12,600,000 blogs were scheduled for deletion in the summer of 2023.
- To preserve these blogs, the BNF (Bibliothèque nationale de France) decided to archive the entire collection as part of their mission to preserve web content.
- This effort is in line with the legal deposit mission of the BNF, which involves capturing and preserving web content that has been transmitted, disseminated, read, or listened to at a given time.
5. 🔗 The Value of Past Web Content for Future Research
- Past web content, including personal blogs and social media posts, holds potential value for future research, as it may interest researchers in 10, 20, or even 50 years.
- Even seemingly frivolous or anecdotal content, such as a 12-year-old's blog or a TikTok video, can provide insights into cultural and societal norms of the past.
- The preservation of such content is likened to archiving significant manuscripts and is considered valuable by institutions like the BNF (Bibliothèque nationale de France).
- The distinction between academic content and personal or seemingly trivial content is blurred when considering long-term research value.
6. 📜 Diverse Web Content and Its Place in Archives
- Web archives collect a wide range of content, including marginalized segments like pornographic sites, reflecting the diversity of the web ecosystem.
- Two main types of collections are conducted: broad collections covering a wide spectrum of French domain names annually, and targeted collections documenting specific events like the trial of Nicolas Sarkozy related to the Libyan dossier.
- These targeted collections ensure real-time updates are archived, preserving significant contemporary events.
- The archiving process employs a robot named Eritrix, which simulates user behavior by navigating through specified URLs and clicking on links up to a certain depth.
- Challenges in archiving include capturing dynamic content and ensuring comprehensive coverage of rapidly changing web pages.
- The inclusion of marginal content ensures a holistic representation of cultural and societal trends in web archives.
7. 🤖 Collecting Data: The Role of Robots in Archiving
- The data center at BNF is highly restricted, with limited access for personnel, ensuring data security and controlled operations.
- Robots are tasked with collecting millions of URLs, indicating an automated and large-scale approach to data archiving.
- Data collection by robots is conducted at variable frequencies, highlighting flexibility and adaptability in data acquisition processes.
- The robot operates continuously once initiated, demonstrating high efficiency and the ability to maintain constant data flow without downtime.
8. 🔒 Safeguarding Digital Heritage for the Long Term
- A collection of 60 billion URLs and web pages is being preserved, representing a vast digital archive of the French web from 1996 to the present.
- The storage capacity required for this archive is managed through 2,400 disks of 4 terabytes each, equating to the size of a wardrobe.
- The initiative ensures that the digital heritage is safeguarded against various threats such as flooding, fire, and deterioration to maintain its integrity over the long term.
9. 🕹️ Interactive Content and Restricted Access in Archives
- To ensure long-term access to information, technologies used at the time of collection are also preserved, enabling replay of content as it was originally, even decades later.
- Example: A Flash-based video game captured in 2011, which has disappeared from the internet, is still accessible in their archives.
- Internet archives are not freely accessible online; they must be accessed in person at specific locations such as the BNF or regional libraries on dedicated stations.
- The mission of collecting and archiving French digital content is a public service aimed at preserving national heritage for future generations.