Sam Brenton, Bridging the Digital Gap Trainee, writes about his work in Special Collections and the Theatre Collection.
Hello, my name’s Sam and I’m the Digital Archives trainee on the Bridging the Digital Gap programme from The National Archives. This scheme aims to place people with technical skills within archives around the country to help preserve the increasing number of digital items they collect. Over the past fifteen months I’ve been at the University of Bristol, working on a number of digital archiving projects with Special Collections and the Theatre Collection. One of the things I’ve been working on is expanding the quantity of web pages in the University’s web archive.
So what is a web archive? And how is it different from the website itself? A web archive is a collection of web pages preserved offline and is totally independent from the source website. This means that should the original web pages become unavailable, or are altered in any way, there is still a perfect copy of the original. The pages are stored as WARC files, a format specifically designed for the preservation of web pages as it acts as a container for all the elements that make up the web page, such as text and images.
When we’re concerned with long term preservation we can’t guarantee that the pages will still be hosted by their original source. Sometimes this is simply because an organisation likes to regularly refresh its content, for example a manufacturer listing their current products. But even something that appears to be more permanent, such as online encyclopedias and other information resources, may be altered, or older content might be removed without warning. It’s important that an archive is aware of any websites that come under the scope of its collection policy, particularly any that might be at risk.
In the future I’d like to look into adding relevant external web pages to the collections. In due course, we also hope to be able to catalogue and make web crawls accessible. In the longer term I would like to look into archiving social media profiles. These are far more challenging to preserve due to log-in requirements and the large number of interactive elements, but they are arguably just as important as standalone web pages. The posts are far more ephemeral than web pages and we are reliant on the platform to maintain them. They are also a key way that the University communicates with the public.