Why Storage is not Preservation: A Conversation, surrounded by Conservation

A blog dialogue by DRI’s Kathryn Cassidy @angrybunnie and Natalie Harrower @natalieharrower, set in Oxford University’s Museum of Natural History

Why Storage is not Preservation: A Conversation, surrounded by Conservation

Natalie Harrower & Kathryn Cassidy

Image Credit: Natalie Harrower, 2017. Licence: CC-BY-SA

While catching furtive chats between sessions at the recent PASIG conference, held at the Museum of Natural History in Oxford in September 2017, DRI’s Kathryn Cassidy (@angrybunnie) and Natalie Harrower (@natalieharrower) started a conversation about the ongoing misperception that storage – backing up files somewhere, or backing them up two or three times – is equivalent to preservation, or if not equivalent, at least sufficient to ensure the long-term survival of digital objects. This perception is not held by people working in digital preservation, which makes up the bulk of PASIG attendees, but it is more broadly held, despite the efforts of digital archivists, repository managers, and preservation engineers to disabuse the world of such notions. What uensues is a conversation, caught midway, between Natalie and Kathryn:

(Sound of crockery clinking, general mulling of people at a coffee break)

Natalie: (Stirring coffee) There has been so much talk of the importance of stewarding digital objects for the long term – for providing trusted long-term digital preservation – yet it seems the idea persists that preservation is equivalent to storing files and backing them up. And that if you want to be *really good* at this, you make lots of copies and back them up more frequently. Why do you think the focus is on storage? (Takes a sip)

Kathryn: (Nibbling on a biscuit) Well, it is actually fantastic that people are thinking about backing up their data! We’ve come to a point of ‘data consciousness’ where there is, I think, a basic understanding that backing up is the responsible thing to do. We’ve all lost files and it can be disastrous. So it’s a very good development that people are now thinking of making copies of their data and storing them on external drives, etc.

Natalie: I agree! (Claps hands together excitedly). But this is just one step on the path to preservation! And when we think of the size of data – in particular born-digital data – that institutions are dealing with on a daily basis, storage is not sufficient to ensure long-term access.

Kathryn: Yes! I think that is the right way to look at it. Backups and multiple copies are a part of the preservation story, but not the whole. I think if we think about longer timescales, and importantly, access over that longer time, then the difference between storage and preservation becomes clearer.

(Pauses to search for a place to put empty coffee mug).

So, often we are thinking about how to keep our data safe so that we can access it again tomorrow, or next week or year. Just backing stuff up helps with that, but preservation is a process. It’s a process that allows us to keep our data usable in spite of the changes that occur over time.

For example, media can degrade or files become corrupt. This is really where backups help, but only if you have a process around your backups. Don’t just blindly trust your backup software! You should really test your backups regularly by restoring all or part of your backed up data and checking that the restored files are the same as what was originally stored. And in order to be able to check that the files are the same, and that all of them are still there, you should think about creating manifests and generating and storing checksums for files. Maybe your backup process is actually introducing errors, or it is misconfigured so that it is ignoring some important files.

Natalie: I like how you used the word “process” there. Look around us today (gestures to the dinosaur bones on display, and insects carefully pinned in cases) All of these items have been carefully prepared by conservators for display in the museum – so visitors can see them over time. The initial preparation is one stage, but then they have to be cared for over time as well. Some items are behind glass cases, some are out in the open. And in order for visitors know what these wild and wonderful things are, the museum collects and shares information about where the item was found, why it matters to us, and sometimes how it got to this museum. They might carbon-date it, conduct anthropological research into its significance, and so on. The point is that a lot of work goes into preserving artefacts over time so countless visitors can enjoy them. And curators don’t always know what might be the most important items 100 years from now (eg. species that have gone extinct), so they have to apply care of stewardship to everything they want to preserve. Digital preservation requires a similar level of care, but with unique challenges, because files – digital ‘objects’ – can disappear completely. If a museum object degrades it will look poorly and maybe need propping up, but you can still see some aspects of it. If on the other hand, a digital file degrades, you might not be able to open it at all. Digital materials are incredibly fragile, but they are increasingly making up the bulk of what we humans produce as knowledge and culture (Snaps a pic of a nearby display).

Image Credit: Kathryn Cassidy, 2017. Licence: CC-BY-SA

Image Credit: Natalie Harrower, 2017. Licence: CC-BY-SA

Kathryn: Yes, but it’s not even just file degradation that puts digital materials at risk. The technological landscape changes over time too. For example, file formats change.

Natalie: You’re absolutely right. I actually tried opening an essay from my undergrad the other day, and it was tricky. I’ve been holding onto these files for a long time, backing them up, transferring them to a new external hard drive every four or five years, and so on. But I can’t get it to open! There is probably a converter plug-in out there I could use, but who knows how long that will work for? It’s like I had done everything right in terms of ‘saving it’, but it’s not enough. Now, most people may not care about opening a student essay from years ago, but this same problem can apply to any type of digital item in the public realm.

Kathryn: Absolutely! No amount of copies of a file is going to help if the software to open it doesn’t exist anymore. Preservation involves being aware of changes in software and file formats and taking actions to deal with any problems before they arise. So, for example, identifying at-risk file formats and migrating them to more-widely supported formats might form a part of a professional preservation process.

And another thing that changes over time, unfortunately, is our own memories. At least you were able to remember where you had the file saved. I know I saved copies of my college essays but I wouldn’t know where to start to look for them now! You can so easily forget details about where you’ve stored your data.

Natalie: And this problem only gets multiplied when we look at collections of public or institutional data, because the volume of materials preserved over time is much larger than what someone might have in personal collection.

Kathryn: Yes. And something that’s quite common is that an organisation just replicates an entire existing filesystem remotely or runs a script to back it all up. They are almost certainly relying on knowledge that may not be written down, but may exist only in their staff’s heads. Staff who are familiar with the material will be able to find the data they require based on filenames and directory structure, without the aid of proper metadata or a finding aid or catalog. But for such a resource to be useful in the longer term it must be possible for someone to find and understand the data and its structure without that experience and knowledge.

Natalie: So this is where every archivist’s favourite term comes in: Metadata!

Kathryn: Yes, we talk a lot about metadata at DRI! Metadata is a fancy word for ‘data about data’, or information that describes data.

Natalie: And metadata is important for what you said – properly locating a file – being able to search for it, etc. People talk about catalogues or finding aids. But it’s also important for providing context for the data. In the same way that you can forget where you saved a file, you might forget what it is, or what it means – without good metadata the meaning of the data can be lost. If we think of the research world, there are clear examples. If you save a dataset, for example, without providing information on who created the dataset, what research it supports, and so on, then the dataset itself might be incomprehensible. And this information needs to be provided as close to the creation of the file as possible, because memory fades, people move on…

Kathryn: On an institutional level as staff members leave the organisation and move on this kind of informal information is lost. As practices within your organisation change you may find that it becomes harder and harder for new staff to find and make sense of older data. This is exactly what the creation of good metadata helps to accomplish.

By taking the time to describe the data using descriptive metadata to identify people involved, dates, subjects, topics, etc. you will make that data much easier to find, retrieve and reuse later on.

Natalie: Ok, but sometimes people shrug their shoulders and say “Why should I go to all the trouble? What if I only need my data to survive for a couple of years? Why would I go to the effort of preparing it for longer than that?”

Kathryn: (Nodding in agreement) That’s a good question. It is important to point out that not everything needs to be saved! Archivists are well aware of this, and have procedures for appraisal to determine what is necessary to preserve. But this being said, a lot of the potential threats to data we’ve been discussing could occur in the short term as well. Media failure, incomplete backups or staff change can happen at any time. Also, when it comes to public data, or research data, or digital cultural heritage materials – you can’t quite predict how that data may be used in the future. So the overall point here is — if you steward it properly to begin with, you make re-use, and the value of that re-use, possible.

(Bell rings to signal the end of the coffee break)

Natalie: Yes! (waving a teaspoon emphatically). I think that is the key message about the importance of preservation. Data preservation is a continual process that requires planning from the beginning. (Stealthily putting a biscuit into her conference bag). We have to head back into the sessions, but this has been a great conversation! Next we need to tackle why people think that ‘putting things on the internet’ is preserving them!

Kathryn: Definitely a conversation for another day…

Storage is not preservation because

What you store can be corrupted or the storage media can fail causing data loss
As technology and software change, data can become technologically inaccessible
Files ‘saved’ on their own may not be findable – they need metadata to allow us to find them
Any kind of file needs context to make it meaningful – dates, subjects, titles, authors, etc.

To read more entries in the DRI blog, please visit the Blog Page.

Why Storage is not Preservation: A Conversation, surrounded by Conservation

Why Storage is not Preservation: A Conversation, surrounded by Conservation

Subscribe to our newsletter and stay updated.