Last November, the NDA embarked on a new collaborative project with the Digital Preservation Coalition (DPC) to lay the foundations for tackling digital preservation. This project was introduced in an NDA blog post in June 2018. I’ve been working on this project with colleagues at the NDA for a year now and World Digital Preservation Day today (7 November) seems like an excellent opportunity to share some information about what we’re doing and why.
What is digital preservation?
Digital preservation is all about maintaining authentic digital records for as long as required, and in a form that can be accessed by users (both now and into the future).
Digital preservation is not a one-off activity but requires a series of actions and interventions over the lifecycle of the digital records.
There are huge challenges associated with maintaining usable and authentic digital content as the technological landscape rapidly evolves around us - add in social and organisational change and the challenge grows.
Digital preservation is not about keeping everything forever. There are costs associated with digital preservation, thus it is important to ensure that it is only valuable content that is kept (and only for as long as necessary). Deciding what to keep and what not to keep and for how long is very much a part of digital preservation. Attempting to preserve too much digital information reduces the resources available to look after the really important content.
Why is digital preservation so important for the NDA?
The NDA has responsibility for records of key national and international importance, including those relating to the disposal of radioactive waste.
These records are increasingly being created in digital form.
These records must be both accurate and authentic and need to remain in a usable form for several hundred years.
The NDA is already responsible for large quantities of legacy data that is reliant on obsolete hardware, software or operating systems. Meanwhile, new digital information is being produced at a rapid pace, both ‘born digital’ and through digitisation workflows.
Why is it so hard?
The long-term preservation of physical documents, such as paper reports, can largely be considered to be a solved problem. Once physical documents have been secured by an archive, checked, documented and transferred to a stable and controlled environment, (for example, well designed archival strongroom facility) they can remain in a good condition for many years, especially if they are left alone and not frequently accessed. Of course we can prove this - in the archives world there are examples of documents that have survived for hundreds of years.
Digital records are different.
One of the worst things we can do with digital records is put them in a box in a strongroom and leave them alone. Imagine if you had been doing just this for the last 40 years - you would now be faced with a selection of obsolete floppy disks of different sizes, perhaps some Zip disks, many CDs and DVDs and, of course, a selection of USB memory sticks and portable hard drives.
For some kinds of media, it would be hard to find any hardware that can successfully read them, and if you did, you probably wouldn’t have the right software to open the files. For others, there would already be a certain amount of data loss as disks or drives had already corrupted and become unreadable. Perhaps you’d find they’d been password protected or encrypted, and the information to open them and access the content was no longer available.
Digital records need continuous active management to keep them alive. They should be periodically checked to ensure that they haven’t been accidentally or deliberately damaged. Multiple copies need to be kept to ensure we can recover from errors or anomalies that are encountered when we carry out these checks. We need an understanding of the file format, and must monitor and understand the risks associated with it over time.
Before a file format reaches a point where you can no longer find the right software to read it, a decision needs to be made about what action to take to enable continued access to the record. This may mean migrating the record to a newer or current file format or using an emulation environment to access the record in its original format. Both of these approaches can be complex and time-consuming.
What has the NDA project achieved so far?
Much of the first year of the project has been spent understanding the specific needs of the NDA with regards to digital preservation. Having spent my digital preservation career until this point in academic libraries and archives, the context of the NDA is quite different in many respects. Fortunately there are more parallels than differences regarding the needs of the digital information to be preserved. The information-gathering that has taken place in the first year of this project has led to an initial draft of a digital preservation policy for the NDA which will be refined over the next year before wider release.
Another important piece of work for this project has been the production of a new digital preservation maturity model - the Digital Preservation Coalition Rapid Assessment Model (DPC RAM). This will be used as a self-assessment tool to measure both the NDA and the other 94 members of the Digital Preservation Coalition and allow benchmarking of digital preservation maturity both now and in the future.
The model was officially launched at an international conference in Amsterdam this September and has received praise from the wider digital preservation community. The NDA’s support of this piece of work really helps to raise its profile in the digital preservation world, moving it towards the goal of becoming a trusted leader in the field of long-term preservation.
At-risk digital materials
The theme of World Digital Preservation Day this year is ‘At-Risk Digital Materials’ to mark the publication of a revision of the ‘Bit List’ of Digitally Endangered Species. The 'Bit List' is a crowd-sourcing exercise to discover which digital materials the digital preservation community thinks are most at risk. By compiling and maintaining this list over the coming years, the DPC aims to celebrate great digital preservation endeavours as entries become less of a ‘concern,’ whilst still highlighting the need for efforts to safeguard those still considered ‘critically endangered’.
Several of the themes that have emerged in the ‘Bit List’ this year are of relevance to the NDA and have been highlighted as areas for focus as the project progresses. These include:
- Sound and vision - the NDA holds large quantities of audiovisual material with much of it still on carrier formats (for example VHS, reel-to-reel) and new content being recorded in digital form. As it gets harder and harder to source and repair viewers/players for analogue audiovisual content, it is understood that the best method of preserving this content is through a digitisation and digital preservation strategy. ‘Born digital’ audiovisual content should be created with long term preservation in mind if the content is to be kept for the medium to long term.
- Engineering formats - the NDA creates detailed 3D digital models of buildings prior to construction. These models are hugely complex and typically reliant on specific software to access and interrogate them. As the software landscape rapidly evolves, it will become harder to maintain access to the models and underlying datasets for future users.
- Portable media – there have been many different types of portable digital media in use over the last 40 years as digital technologies have evolved. The NDA is not unusual in having examples of many of these across the estate. Valuable digital content stored on portable media is vulnerable to loss and should be retrieved and moved to a more stable preservation and storage solution.Other types of digital content highlighted as a priority for the NDA include radioactive waste records, information in legacy database systems and records in Electronic Records Management Systems. These topics will all be investigated as this project continues.It is anticipated that the work of the project will be incorporated into future revisions of the ‘Bit List’, highlighting successes and examples of good practice and leading to a greater understanding of preservation approaches and solutions. The latest list will be available tomorrow, 7 November, at midday.
- Follow the #WDPD2019 hashtag on Twitter