Searching for Black women in the archives: Part 3
This is the third in a series of posts from DPLA’s Audrey Altman about the curatorial and technological challenges involved in the development of the Black Women’s Suffrage Digital Collection. As a data engineer, Audrey worked alongside the curators Shaneé Yvette Murrain and Kathleen Williams to address underlying biases in the collection and surface representative stories about Black women’s contributions to voting rights movements.
The words we choose
In the first and second parts of this series, I discussed how Black women have been historically under-represented and misrepresented in archival collections. The Black Women’s Suffrage Digital Collection works to redress these inequities by bringing together materials by and about Black women who contributed to voting rights movements throughout United States history. Identifying materials for the collection was a challenge, due to the diverse and evolving ways in which librarians and archivists describe, organize, and label artifacts about Black women and their communities.
In order to find materials for the Black Women’s Suffrage collection, the curation team explored the Digital Public Library of America. DPLA provides a single search index that allows users to explore over 40 million cultural heritage materials from libraries, archives, and museums across the United States. Specifically, DPLA has an aggregation of over 40 million metadata records. Each metadata record contains a brief description of some cultural heritage item: a photograph, manuscript, letter, book, audio recording, etc. It includes information such as who created an item, when it was created, and some subject terms that describe what the item is about. If the item has textual content, like a book, pamphlet, or newspaper, the metadata record will not include the full-text; it will, however, provide a link to a full-text version of the item. Since human curators cannot individually examine each of the 40+ million items in DPLA’s aggregation, we developed an algorithm that can. The algorithm analyzes the entire DPLA corpus and selects materials for inclusion in the Black Women’s Suffrage collection based on which keywords are present in their metadata records.
Subject keywords are particularly helpful to our curators and our algorithm, and provide one of the most reliable means of selecting materials for the collection. When assigning subjects to materials, librarians and archivists often use controlled vocabularies, authoritative sets of terms that enable consistent description across different collections and institutions. Among the most common controlled vocabularies used by DPLA’s contributors are Library of Congress Subject Headings (LCSH) and Library of Congress Classification (LCC) — but other vocabularies are also used. In theory, controlled vocabularies ensure that all materials with the same subject are assigned precisely the same subject term, thereby making it possible for users to systematically find all materials relating to any topic. The reality is much messier. A controlled vocabulary may be applied with any number of variations, reflecting the localized practices of thousands of individual institutions and multiple generations of librarians and archivists. In large aggregations such as DPLA, there are so many standards and variations in play that finding a comprehensive set of materials on any given topic is quite challenging.
One strategy to surface materials about Black people is to search with keywords such as “African Americans,” or other racial signifiers. For example, one might search for “African American suffragists” or “African American political activists.” However, this is not as straightforward as one might hope. Just as words pass in and out of common and socially acceptable use, terminologies within controlled vocabularies change over time. The LCSH term for Black people has changed several times. Writing in 2006, librarian Jeffrey Beall explained, “Throughout most of the 20th century, the heading was Negroes. This changed in the late 1970s when the form was changed to Blacks. Later, in the 1990s, the heading was changed to Afro-Americans. A few years ago, the heading was changed again, so that the current heading is African Americans.” The financial cost of remediation (labor, time) makes it impractical for most libraries to keep all their records up-to-date with the most current terminologies, so records tend to contain whatever terms were recommended at the time the record was created. Indeed one can find records with all four of the terms Beall mentioned throughout the DPLA aggregation. Subject terms may also include gender signifiers, such as “Women suffragists.” Since there are so many possible combinations of subject terms with racial and/or gender signifiers, it can be challenging for a person looking for materials about Black women to use these vocabularies effectively.
Of course, not all materials about Black people will be explicitly labeled with racial signifiers. Librarian Doris Hargrett Clack observed that many of the LCSH terms describing the African American experience are imprecise or ambiguous; for example, the subject terms “Civil rights,” “Abolitionists,” and “Minorities” could relate to African Americans, or to people of a different race or nationality. Disambiguation of such terms was a recurring challenge for the Black Women’s Suffrage team. The term “segregation,” for example, recalls many relevant documents about the history and context of Black voting rights movements — but also recalls materials about Japanese internment during WWII, South African apartheid, and metallurgy. It required several rounds of analysis and revision to filter out non-pertinent materials related to “segregation,” and other terms with broad or multiple meanings. Records may also be labeled with terms a student of Black history would recognize as implicitly relevant, such as “Harriet Tubman,” “NAACP,” or “Fifteenth Amendment,” but not with racially explicit terms such as “African Americans” or “African American History.” Our team had to develop enough subject expertise to search for specific people, events, publications, and organizations — and then had to learn how each of these concepts was expressed using different vocabularies and standards.
There is a surprising amount of nuance and diversity in ways identical concepts can appear in different metadata records, either because different standards are used, or because the standards allow a certain degree of flexibility. There are structural difference, e.g. “Black women” vs. “Women, Black”; abbreviations, e.g. “National Association for the Advancement of Colored People” vs. “NAACP,” and alternative spellings e.g. “Nineteenth Amendment” vs. “19th Amendment.” It is particularly difficult to find all possible variants of people’s names. For example, in 41 total records about abolitionist Sarah Parker Remond, her name was expressed in 5 different ways:
- “Sarah Parker Remond” (4 records)
- “Sarah P. Remond” (1 record)
- “Remond, Sarah P” (2 records)
- “Remond, Sarah Parker 1826-1894” (22 records)
- “Remond, Sarah Parker, 1826-1877?” (12 records)
All of the variations make the process of finding a comprehensive set of materials about a topic very challenging and unintuitive, even when the person doing the searching has deep subject expertise.
The Black Women’s Suffrage selection algorithm needs to identify materials about White people as well as Black people. In the first part of this series, “Missing from History,” I discussed the intrinsic difficulty in building a collection of predominantly Black voices when the majority of DPLA’s materials are by and about White people. There is a significant risk that the volume of materials about White people could overwhelm those about Black people, simply because more materials about White people exist in DPLA’s corpus. In order to strike that balance, our team needed to identify and limit materials featuring solely White subjects.
With a comprehensive list of term variations, we could reliably find materials about Black people; however, filtering out materials about White people proved much more difficult. As librarians Sara A. Howard and Steven A. Knowlton found, LCSH and LCC assume whiteness by default. In other words, materials about Black people are much more likely to be explicitly labeled with a racial signifier than those about White people. It is therefore difficult to filter out materials that are solely about White people, or to easily gauge what proportion of the materials in the Black Women’s Suffrage collection are about Black vs. White subjects.
In addition to leveraging the many nuances of controlled vocabularies, the Black Women’s Suffrage selection algorithm also needs to handle uncontrolled language. When librarians and archivists encounter problematic standardized terms in their collections, they may decide to change them to more fitting, nonstandard terms. Throughout United States history, the controlled vocabularies adopted in professional “best practices” have been largely created by White people. Therefore, Black people have had little power within libraries and archives to choose which words best describe their experiences, the artifacts they produce, and themselves. While digitizing materials for Umbra Search African American History, archivist Dorothy Berry considered using culturally respectful terms, such as “African American interactions with criminal justice” instead of some problematic LCSH standard terms, such as “African American juvenile delinquents.” Ultimately, Berry decided to use standard terms to support cross-system consistency, but in another context, using nonstandard terms would have been appropriate.
A related and emerging practice among archivists encourages active participation from under-represented communities to develop new standards to describe and organize their materials. Scholars Katie Shilton and Ramesh Srinivasan observed, “Using archival arrangement and resulting descriptive practices to preserve contextual value as the community understands it allows historically marginalized communities to speak, not be spoken for.” These collaborative practices harnesses community expertise and empowers people to represent themselves accurately and fairly.
DPLA does not change descriptive terms in our metadata records, respecting the authority of our partner institutions to choose whatever standards and practices they deem appropriate. Yet, the Black Women’s Suffrage team needs to be attentive and responsive as librarians, archivists, and historically marginalized communities work together to improve descriptive practices. We will need to learn and use new terminologies and ways of searching alongside more established standards and practices, and incorporate these changes into our selection algorithm.
In the next post, I will discuss how DPLA’s technology team strove to create a selection algorithm that accounts for the nuances and inherent biases in the millions of metadata records it evaluates, and supports the anti-racist objectives of the curatorial team.
Audrey Altman and DPLA Director of Community Engagement Shaneé Yvette Murrain will be talking more about the creation of the Black Women’s Suffrage Digital Collection at a DPLA Member Brown Bag on February 25th at 1 pm ET. If your institution is a DPLA member, you can register for that conversation here.
More from the Searching for Black women in the archives series
- Part 1: “Missing from History”
- Part 2: “Who gets to tell their story?”
- Part 4: “An intentional algorithm”
You may also like
- “DPLA launches Black Women’s Suffrage Digital Collection”
- “Our Race, Gender, Politics, and History Event”
- ^ Jeffrey Beall, “Ethnic Groups and Library of Congress Subject Headings,” Colorado Libraries 32, no. 4 (2006): 37-44.
- ^ Doris Hargrett Clack, “Subject Access to African American Studies Resources in Online Catalogs: Issues and Answers,” Cataloging & Classification Quarterly 19, no. 2 (1994): 49-66.
- ^ Sarah A. Howard and Steven A. Knowlton, “Browsing through Bias: The Library of Congress Classification and Subject Headings for African American Studies and LGBTQIA Studies,” Library Trends 67, no. 1 (2018): 74-88.
- ^ Dorothy Berry, “Digitization and Enhancing Description Across Collections to Make African American Materials More Discoverable on Umbra Search African American History,” The Design for Diversity Toolkit, August 2, 2018.
- ^ Katie Shilton and Ramesh Srinivasan, “Participatory Appraisal and Arrangement for Multicultural Archival Collections,” Archivaria 63 (Spring 2007): 87-101.