Curating Collections as Data

This is the first blog in a series of posts related to the Cultural Heritage Image Sharing Recommendations produced by the WorldFAIR Project’s Cultural Heritage Image Sharing Working Group. Learn more about DRI’s role in the WorldFAIR Project.

A conversation with Mikala Narlock, Director of the Data Curation Network (DCN) and Beth Knazook, DRI’s Research Data Project Manager

BK: Welcome Mikala! I’m looking forward to diving into this topic again and revisiting some of the conversations we’ve been having in different venues lately, but first I should start with a little background for our readers: In April 2023, we participated in an event hosted at Internet Archive Canada in Vancouver, BC, Collections as Data Futures, organised by Thomas Padilla, Helen Scates Kettler, Stewart Varner and Yasmeen Shoreish. There were more than 60 representatives at the meeting from Galleries, Libraries, Archives and Museums (GLAM organisations) around the world, as well as researchers and data curators with experience using cultural heritage collections. In advance of the event, all participants were asked to submit Position Statements exploring what GLAM ‘Collections as Data’ really means in our work and our research. You and I decided to collaborate and submit a joint position statement that explored the convergence of curation in the GLAM sector and the emerging field of data curation, called Building the Collections of Tomorrow. I think a lot of optimism went into that title! Should we start there? How did we land on the idea of curation as key to building a collaborative digital data culture?

MN: I think we envisioned this future for a few reasons, which I’ll try to talk through—but I think the reasons are really interrelated and overlapping. As individuals with experience and expertise in data curation, we know the importance of building relationships with communities. In the DCN, we are often connecting with researchers to curate their datasets and scholarly outputs to ensure their materials are FAIR (findable, accessible, interoperable and reusable). However, we know that during the curation process, we aren’t just preparing one dataset for sharing: we are forming a relationship with a researcher, which will over time result in better datasets that are more robust from the start. In the same vein, I think we were reflecting on the Santa Barbara Collections as Data Statement, which also emphasises the importance of connecting with communities, we saw this substantial overlap. From there, we really started reflecting on not only what the cultural heritage sector could learn from the practices of research data curators, but also vice versa. I think it’s worth highlighting that you and I have worked in a wide range of GLAM organisations of the years, which certainly influenced how we perceive the areas of overlap. And lastly, I can’t speak for you Beth, but my background is interdisciplinary humanities—so I’m always looking for connections between topics.

I was reflecting on our statement lately, having had some time and distance to reflect on what we wrote, and I found myself nodding along, excited at some of the questions we raised. Have you reflected on it recently? Is there anything that stands out to you now?

BK: The need to better recognise the contributions of data curators to the academic record is definitely a theme that I want to explore more. It seems to be generally well-accepted that curation is a critical research skill in the GLAM sector—admittedly encompassing a variety of practices with different labels at different institutions (appraisal, selection, description, education, engagement, etc.)—that can ultimately exert a powerful influence on the research questions we ask of our collections, as well as the potential for those collections to answer them. Approaches to curatorial practice that are committed to critical diversity, inclusivity and accessibility in collection building are growing in the field (see, for instance, View from the Field: Equity Oriented and Anti-Racist Curatorial Practice).

Research data doesn’t necessarily have the same obligation to be equitable in what it represents. (Okay that probably sounds bad, but bear in mind that I am talking about what data is recorded in a given dataset rather than any obligation to make a balanced argument, which might use that data in combination with existing literature or data from another source.) In fact, researchers might delve deeply into one narrow aspect of a topic to surface some unknown or previously misunderstood story, sometimes focusing narrowly on a particular perspective, issue or idea. Data curators may do an awful lot of work to contextualise and represent project data in ways that ensure the context of the research question is communicated, and interpretive materials (articles, chapters, conference proceedings, etc.) are linked to the dataset, in order to help mitigate misreadings of the information presented. You brought up the idea of curation as a care practice in our statement, which I think really signals the importance of this work to those working with cultural heritage data. How is the idea of care being explored at the DCN?

In the DCN, we discuss regularly how curation provides an opportunity to make data more ethical. (…) We know our curation review has immense potential to shape not just the dataset, but also the world around us.
Mikala Narlock, DCN

MN: Yes! This is something that is almost always on my mind, for a few different reasons. One of the biggest ways we discuss curation as a care practice is through the lens of the CARE Principles for Indigenous Data Governance. The principles, which complement the FAIR principles, provide a framework for data sharing and reuse that reflects the “crucial role of data in advancing Indigenous innovation and self-determination.” These are incredibly useful for our curators when reviewing research data that is about specific communities or on tribal or protected land. But, it also helps us reflect on the impact of sharing a research dataset. Think of a collection of geographic locations of plants or animals: Could this data be used, either alone or in conjunction with other data, to cause harm to that ecosystem? In the DCN, we discuss regularly how curation provides an opportunity to make data more ethical. Even if we can’t change how the data were collected, we have power to ensure that the data we publish—or don’t—is done with a perspective to, at the very least, not do further harm. I’m incredibly grateful to work with my DCN colleagues to learn more about this. It’s really empowering, I think, to hear candidly about those experiences!

Additionally, like I mentioned earlier, curation is a chance to build a relationship. The researchers we work with are, at the end of the day, humans, too. We take an empathetic approach to curation that recognizes the deadlines we are all facing. We give as much grace as we can to researchers and ourselves. And I think we feel this balance. We know our curation review has immense potential to shape not just the dataset, but also the world around us. We take this responsibility seriously—but we also recognize that, at the end of the day, we might have to embrace the “good enough” instead of perfection if it means we can end our day at 5pm.

In short, it’s something that we’re definitely still discussing and reflecting on in the DCN. How does the notion of care emerge in the DRI?

BK: Care is always front of mind at the DRI too! As a national infrastructure with a mandate to collect, share and preserve Ireland’s social and cultural heritage data, we are very conscious of our responsibility, alongside our member organisations, to make space for community and grassroots efforts that give a voice to socially marginalised groups. The DRI has been involved in a number of research projects where we’ve been able to actively collect and curate our own datasets (such as Archiving Reproductive Health), but you and I both know that as much as data curators might be enthusiastically engaged in advocacy and outreach, at the end of the day, we can usually only collect the data we’re offered. So the DRI has been trying to ensure that more people are empowered to approach us in the first place. In 2018 we launched a Community Archive Scheme which offered free membership and a range of benefits to an unfunded organisation to help them deposit their collections in our Repository, giving them a wider audience and the peace of mind that their work is being preserved. Since we launched the program, we’ve learned a lot about what it takes to really support community archives, managing effort and expectations on all sides. We produced a Guide to Archiving Digital Records for Volunteer and Community Groups and are looking forward to hosting a Community Archive Symposium later this year. [Editor’s note: DRI Director Lisa Griffith and Digital Archivist Kevin Long are giving a talk on this subject at iPRES 2023, ‘Building a Community Archive (When You Don’t Know Where You’re Going)’.]

We’ve also been looking more closely at how we can better facilitate responsible computational reuse of cultural heritage datasets through the WorldFAIR Project, and I’m mindfully inserting the word ‘responsible’ there because I think it became clear over the course of the project that explaining curatorial decisions in a more accessible way would be one of the most important things we could do to support the use of collections as data. This is my sneaky segue to mention your participation on the WorldFAIR Project Cultural Heritage Image Sharing Working Group. It was really great to have your input on this work! Is there any takeaway from the recommendations we published that made you think about data curation differently?

MN: I love a sneaky segue! One takeaway I’m still reflecting on is the recommendation for transparency: “Information about the creation, management and preservation of files should be visible and understandable by both humans and machines.” I completely agree with this for GLAM organisations—and a colleague and I are writing an article now that echoes this point. But I struggle with this in data curation, where I personally feel there is more tension between visibility and invisibility. On the one hand, we sometimes say good curation should be invisible: it should look like the researcher brought the dataset to the repository in the same format a reuser might download it. We’ll record curation actions in a curator log, but that information is often kept private in a repository’s internal documentation. But, as we’ve talked about today and in our statement, we know the power curators have to shape datasets and collections. I don’t think I have an answer to this, but it has definitely stuck with me.

What about you, Beth? Is there a recommendation that has stuck with you?

As a side note, I remember emailing Thomas Padilla early into my library career, asking about the overlap between FAIR and Collections as Data. At the time, there wasn’t much out there, so I’m incredibly excited to see this published! Congratulations to you and the entire working group on this achievement!

BK: Thanks, but a little part of me thinks you should save your congratulations until you see how well we manage to live up to the recommendations! [laughing] I have to say that all the recommendations have stuck with me quite literally, especially now that I’m getting down into the weeds with my colleague Joan Murphy as to how to make changes to the Repository to support them. We’re working with the whole DRI team on adding functionality, clarifying policy, updating training… Stay tuned for the next in our series of blogs!

Subscribe to our newsletter and stay updated.