DPLA expands Wikimedia work

By Dominic Byrd-McDevitt, September 28, 2022.
Published under:

Over the summer, we gave a wide-ranging update on the accomplishments of our Wikimedia work to date and promised forthcoming news on what’s coming up next. In the last few months, we have launched several new projects with renewed funding, and we are also working to increase collaboration and idea-sharing with the launch of the new Wikimedia Working Group.

DPLA’s Wikimedia Working Group

We are pleased to welcome 10 members from across our member network and the library profession to the new working group. The WWG is DPLA’s largest working group, representing a wide array of interests and experience. The inaugural membership includes:

  • Dominic Byrd-McDevitt (Chair), Data Fellow, DPLA
  • Meredith Doviak, Community Manager, National Archives Catalog, National Archives and Records Administration
  • Eben English, Digital Repository Services Manager, Boston Public Library & Technical Lead, Digital Commonwealth
  • Christine Fernsebner Eslao, Metadata Technologies Program Manager, Harvard Library
  • Jamie Flood, Senior Wikipedian and Outreach Specialist, National Agricultural Library
  • Giovanna Fontenelle, Program Officer, Culture & Heritage, Wikimedia Foundation
  • Rachel Meibos Helps, Wikipedian-in-Residence, Harold B. Lee Library, Brigham Young University (Mountain West Digital Library)
  • Evan Robb, Digital Repository Librarian, Washington State Library (Northwest Digital Heritage)
  • Angela Stanley, Assistant State Librarian for Innovation & Collaboration, Georgia Public Library Service (Digital Library of Georgia)
  • Greta Suiter, Manuscripts Archivist, Ohio University Libraries (Ohio Digital Network)

The WWG seeks to support and further the work started by DPLA’s grant-funded Wikimedia Project, improving the capacity and sustainability of the project through such initiatives as improving documentation, supporting project participants, and driving new cross-network collaborations. We hope that the formation of this working group serves as an opportunity to shape and help further the collaborative work of the DPLA Member Network, to share expertise with and learn from hub and DPLA colleagues, and to help address common needs and challenges. DPLA is grateful to all of these leaders in the field for volunteering to be part of this effort. If you would like to reach out to the working group on any matter, you can contact them here.

Discoverability initiatives for Wikimedia uploads

Additionally, DPLA’s Tech Team is continuing work with new funding generously provided by the Wikimedia Foundation’s Structured Data Across Wikimedia project. To date, DPLA has uploaded about 3 million digital media files from about 250 contributing institutions to Wikimedia Commons. (Learn more about that in our previous post.) With such a large collection, our main challenge is making it possible for users to locate what they need in order to put these uploads to use. The additional funding will enable the development of new technology, which is already underway, to enhance metadata and improve the discoverability of our collections that have been uploaded to Wikimedia Commons. There are several opportunities for our community members to get involved in these projects. Here are three different projects we are working on, all of which involve the addition of linked data entities to our data in order to improve our descriptions and searchability.

Now accepting URIs for subject terms

Starting soon, DPLA will begin to accept entity URIs for subjects and locations, as well as catalog links for collections. DPLA’s aggregation contains millions of subject terms that are currently ingested as simple strings, even though many of our source institutions may already use a structured vocabulary from authority files and thesauri, such as the Library of Congress Subject Headings. As part of a new effort to make our data more reconcilable with Wikidata—so it can be added in Structured Data on Commons to our Wikimedia uploads—we are already beginning to ingest URIs that are available to DPLA. Right now, the main use of this work will be to support Wikimedia uploads, but adding this data to our aggregation opens up the possibility of using it for other purposes within the DPLA portal (such as improvements to faceting).

If you are interested in participating, we are soliciting all hubs to add URIs, at least for subjects and locations, to the data that DPLA harvests. And once you have done so—or if you are already providing these types of URIs in the data that DPLA harvests from your hub—please let us know through your regular technical contact, so we can update our mappings. You can reach out here for assistance if you would like to talk through any of this before making changes to your data.

Reconciling our terms with Wikidata items

As part of the effort to add subjects and location data to our uploads, DPLA is also working to reconcile the thousands of unique subject strings in our aggregation with Wikidata items. Just within the small subset of DPLA items that have been uploaded to Wikimedia Commons (which is less than 2% of our total aggregation), there are over 37,000 unique subject strings, some of which are used hundreds of thousands of times. For any subjects that do not already have a provided URI, we will seek to reconcile them to Wikidata, since this is what is required to add them to Wikimedia Commons as structured data. For any members of our community experienced in reconciliation projects, especially with tools such as OpenRefine, DPLA could use your help in this effort! Please contact Dominic Byrd-McDevitt to learn more if you would like to work on this project.

New tool to suggest “depicts” values

Finally, we will soon be launching a new interactive metadata enhancement tool called DepictAssist. One of DPLA’s most important goals is to add linked data to our Wikimedia uploads in the form of “depicts” statements. “Depicts” (or Wikidata’s P180 property) is a way of designating the entities that are actually visible in the image. As such, Wikimedia Commons, since it is used for searching exclusively images and media, weights this field most highly in its results. Since DPLA does not have an equivalent element in our data model—and not all of an item’s subjects are depicted in each image, or even tangible—DepictAssist will use other data present in the item to suggest possible entities with which to tag the image.

We are aiming for the tool’s interface to be an engaging way for an institution to easily make their images more discoverable for Wikimedia Commons users. If this sounds like something you would be interested in testing with us, we are inviting volunteers to try out the upcoming beta version. Please get in touch with Dominic Byrd-McDevitt to learn more.

If you are interested in learning more about DPLA’s Wikimedia work, and the ways in which you and your institution or hub can participate, please read more here. We hold monthly office hours on the first Wednesday of the month; the next will take place on October 5, 2022, at 2–3 pm EDT.

DPLA’s Wikimedia work is supported by the Alfred P. Sloan Foundation and Wikimedia Foundation.