Diversity and the DPLA

By DPLA, January 17, 2014.

We at DPLA have been thinking about how we can assess various forms of diversity within DPLA’s collections and partnerships.  Because DPLA is a relatively young organization, there are numerous ways in which we can and should grow in the future. Assistant Director for Content Amy Rudersdorf’s useful post, “State of the DPLA’s Collections: Tracking Our Growth,” outlines the geography of current DPLA Hubs as well as the institutional diversity of our 1,100 partners. It also illustrates our range of item types and languages. As her post reflects, we are invested in broadening DPLA’s content and partnerships by institution type, geographic distribution, item type, and language and feel we have fairly good methods for tracking that growth.

But in this post, I wanted to get at diversity within DPLA’s collections by thinking about the group histories we currently represent–a kind of multiplicity we are also keen to expand. We want to pursue a focus on diversity not just for its own sake, but as a way to promote community, include historically marginalized voices in our collections, and represent a full range of American experiences. We have started an ongoing diversity project that explores our representation of a particular set of historically underrepresented populations in a U.S context:

  • historically non-white racial and ethnic groups
  • cultural/religious minority groups (Jews, Muslims, Hindus, etc.)
  • women
  • LGBTQ communities
  • disabled communities (including the physically, sensorily, and developmentally disabled)
  • rural communities
  • populations with lower socioeconomic status (focusing on poverty, working class issues, labor issues)
  • elderly populations

Using these socio-political identity groups as a working concept of diversity for this project, we’ve approached the task of assessing their representation from a few different angles thus far.


"Harvey Gantt being interviewed," Clemson University Libraries, South Carolina Digital Library.

“Harvey Gantt being interviewed,” Clemson University Libraries, South Carolina Digital Library.

We began by looking at metadata.  At this stage, DPLA contains more than 5.5 million records and that volume, combined with our small staff, requires us to rely exclusively on the descriptive information we get from our partners. This metadata, and the application of subject terms in particular, reflects a huge range of standards, interpretations, and often quality which makes searches for topics related to our diversity project quite challenging.  Put another way, individual contributing institutions don’t always use the same subject terms for similar items, or they don’t use identical subject terms towards the same purpose, or they may not include a number of relevant subject terms in a record. Because there is a range of language used to describe these groups over time, the subject terms reflect that range and reliance on them sometimes includes material not related to a group.

For example, we can roughly guess that there are 40,000 subject terms relating to African Americans within the current data set. However, multiple terms may have been assigned to a single record. And because of the sheer volume of records, it would be impossible to identify all records that contain specific terms within the broad category of “African Americans,” such as names (Rosa Parks, Booker T. Washington), locations (Tuskegee), or events (1811 Slave Revolt led by Charles Deslondes). In addition, not all terms we might associate with a particular population only refer to that population, which may also skew outcomes (Indian, Black, Blind). Based on this, we can only provide the broadest estimates about the topics being described in the data. In order to get useful information about content via subject terms, we would need to do a tremendous amount of subject term clean-up and standardization, which is not within the current scope of our work or staffing.

Contributing Institutions

In addition to thinking about diversity that appears in our content, we also wanted to think about diversity as it is represented in our partners and in particular institutions created to collect materials about the histories of underrepresented groups. For those unfamiliar with our model, the DPLA works directly with Content Hubs (groups with large collections) and Service Hubs (groups who aggregate state or regional collections and send them to us in a single feed). Through our 21 Hubs we work with 1,100 smaller contributing institutions and their content. We examined these 1,100 institutions and found 50 who work with 10 different content and service hubs and collectively contribute more than 11,000 records to DPLA. These institutions included historical societies, museums, academic libraries and centers, special libraries, and archives.

"On a rural road," Environmental Protection Agency, National Archives and Records Administration.

“On a rural road,” Environmental Protection Agency, National Archives and Records Administration.

Some interesting trends emerge in this search. One is the presence of Historically Black Colleges and Universities. They account for 12 of the 26 African American institutional partnerships found in our research and come from the Kentucky Digital Library (1), North Carolina Digital Heritage Center (9), and the Portal to Texas History (2). Another is a group of 4 institutions representing disabled populations from the Minnesota Digital Library (3) and the North Carolina Digital Heritage Center (1). Still other partners represent intersectional identities like Asian American Women or the rural poor.

Although this group is relatively small in comparison to the number of our overall partnerships, these are important institutions to highlight in our collections. They are an important aspect of diversity as an attempt to get not just at who is represented but who has the power to select, curate, and preserve at the institutional level. These partnerships are a starting point upon which we hope to expand through our work with our existing and future Hubs.

Hubs Feedback

Leslie Jones, "Japanese women," Boston Public Library, Digital Commonwealth.

Leslie Jones, “Japanese women,” Boston Public Library, Digital Commonwealth.

Because of issues with content analysis at the DPLA level and the incompleteness of an institution-only view, we need to look to our Hubs to provide us with good information about how diversity manifests itself in their aggregated collections. We ask for the Hubs’ feedback about the presence of underrepresented groups in their collections  and partnerships, as well as their near and far-term plans for broadening this diversity. The Hubs responded in several ways: first by affirming universally that this kind of diversity is a present and future priority, and second by referring us to numerous and important collections that are already part of DPLA. A few examples of the hundreds of collections identified in the Hubs survey give some sense of the breadth of the work is being done at the Hub level. The Digital Library of Georgia, for instance, supports the Civil Rights Digital Library–a group of materials about the Civil Rights Movement of the 1950s and 60s. This library contains approximately 9,000 items and 200 collections that appear in DPLA. The Mountain West Digital Library contributes collections including Arizona Latina Trailblazers and the Desert Jade Woman’s Club–two important collections related to the achievements of women of color. Through the South Carolina Digital Library and its partner the College of Charleston, the DPLA has records for William A. Rosenthall’s collection of Judaica postcards which includes 500 postcards of Jewish synagogues, cemeteries, neighborhoods and other sites of interest collected in the U.S. and Europe before and after World War II.  From the University of Southern California Libraries, we have records for the Korean American Digital Archive which includes oral histories, photographs, and private records about Korean American life in the 20th century.

We hope that this ongoing diversity project with the Hubs will give us data we need to think about DPLA diversity and build our collections and partnerships proactively along these lines. Indeed, we hope, given more time with the current collections identified by the Hubs, to produce a useful overview of what we currently have. We also anticipate that this mode of reporting will help keep these priorities on our collective radar.

Future Directions

The DPLA Content team is already in discussion with several service and content hubs that will greatly contribute to diversity as measured in this project. We have also discussed the possibility of forming new kinds of hubs organized around topics or institution types instead of geography and are eager to hear about possibilities. Using the Hubs survey as a starting point, we also hope to create better avenues for identifying content related underrepresented groups for interested users and look to build resources that will allow us to achieve that goal.

It is also important to remember that current DPLA content is from only a small number of institutions in the U.S. compared to the data available.  If we continue down the path that we have been since our inception, we anticipate that diversity will increase as partners increase, particularly partners for whom this kind of diversity is a priority and who are aggregating from other institutions.  We will target particular partnerships and highlight diversity in our outreach efforts, but time and the addition of new hubs will likely be the most significant factors that increase our breadth and depth of diversity.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.