Tracking DPLA’s growth in 2014

By DPLA, January 14, 2015.
Published under:

Last week, Dan Cohen shared the DPLA’s 2015-2017 strategic plan including DPLA’s goals for community-driven growth and collaborative expansion. Now seems like a good time, too, to look back at the work that our partners and we have done over the past year to increase the numbers and diversity of formats, topics, institutions, and collections that are offered through DPLA.

Institution type

We ended 2013 with just over 1,200 contributors (institutions, organizational units, and even some private collections) from all over the United States, represented by 20 Hubs. A year later, our 23 Hubs represent nearly 1,400 contributors. That’s an increase of 17% contributors last year alone! And, 2015 portends even greater growth.

We have nearly as many public library partners as those in universities, with good representation from archives and museums, as well. The graph below provides a picture of the variety of types of institutions that contribute to DPLA. Hidden within this data are the eight presidential libraries that help to make up the “federal libraries” category, and the local scouting archives, theological societies and church archives, and historical commissions, national parks, and other institutions that make up DPLA’s rich contributor base.
Contributors by institution type
To see how much these contributors’ collections have grown over the past year, check out the following graph. All sectors have seen growth, but we’ve seen the numbers of community colleges (43), K-12 schools (20), and publishers (13) more than double since 2013.

Contributor type growth

Geographic distribution

In addition to the types of institutions represented in DPLA, we also keep an eye on their geographic distribution. Keep in mind that we DO have records about every state in the U.S., and beyond. This map doesn’t show you that, though.  Instead, it demonstrates where our current Hubs are located.

Geographic distribution


In January 2014, we shared progress on a research project about diversity. Through that project, we tried to look at both content and partnerships to discover how often DPLA was representing some of America’s underrepresented groups. Our working definition identifies these groups as:

"Former slave of Eng Bunker," ca. 1880-1890. University of North Carolina at Chapel Hill Library's Southern Historical Collection via North Carolina Digital Heritage Center.

Former slave of Eng Bunker,” ca. 1880-1890. University of North Carolina at Chapel Hill Library’s Southern Historical Collection via North Carolina Digital Heritage Center.

  • historically non-white racial and ethnic groups
  • cultural/religious minority groups (Jews, Muslims, Hindus, etc.)
  • women
  • LGBTQ communities
  • disabled communities (including the physically, sensorily, and developmentally disabled)
  • rural communities
  • populations with lower socioeconomic status (focusing on poverty, working class issues, labor issues)
  • elderly populations

In the 2013 research, we highlighted 50 of our contributing institutions as diverse according to their institutional mission using this definition. By the end of 2014, that number has increased to 86—a 72% jump. The number of Historically Black Colleges and Universities has increased by 50%, and the number of Hubs with “diverse” partners has increased from 10 to 14. That means three-quarters of our Hubs who have contributing partners have at least one partner that falls into the “diverse” category. We are enthusiastic to track and support partner diversity to see who is doing the collecting and representing (in addition to who is being represented).

For 2013, we had some challenges measuring diversity in DPLA items. We initially attempted to look at metadata using subject terms but found that these results were not useful. At the end of 2014, we were able to explore most of DPLA’s 8.4 million records at the collection level using collection descriptions from the Hubs and supplemental research. Using the diversity measurement, we looked through 3,937 collections and identified 435 as organized around the history or culture of one of our diversity groups–11% of the overall collections.

Collections are a way that institutions organize content topically and they can vary quite widely in size from a single item to thousands of items. For a more meaningful way to represent the impact of diverse content, we looked at the number of items in these diverse collections by group (or diversity category). It is important to note that these numbers of items are collection-based and do not account for items that aren’t affiliated with collections.

Graph illustrating number of items per diversity category

While we are pleased to see that some of these groups are relatively well represented by collections, there are a number of groups that need better representation in DPLA. For example, Asian American and Latino collections in particular should have stronger numbers and more variety in national origin. LGBT and disabled communities, as well as Arab Americans and Muslims, need more collections and content. Other categories, like Women, likely have a much broader representation in DPLA but are not as often the topic of collections.

Recruiting diverse content is a priority for most of the Hubs in our network and their work has a major impact on the diversity of DPLA’s collections. Through digitization funding and other means, DPLA has supported efforts to reformat diverse collections at the local level. As seen in the graph below, many of the Hubs make strong contributions to the diverse content accessible through DPLA.

Graph illustating hubs with largest numbers of diverse collections

Growth by records

While we believe that our growth should be measured by the number and variety of our partners and the diversity of their collections, record growth inevitably draws attention. Since 2013, our collections have grown by nearly three million records. That’s a 50% growth in one year’s time!

Each of our Hubs has a different collection policy, partner base, and growth plan. So their growth patterns tend to be slightly different. Still, it is interesting to see how the individual Hubs have grown, and how that compares to the others since it is their combined efforts that help DPLA grow as a whole.

Let’s first consider this by looking at straight record numbers. The clear winner here is The New York Public Library (NYPL), which went from a collection of 14,000 records at the start of 2014 to nearly 1.2 million by year’s end. But the growth of the other nine Hubs in the top ten can’t be overlooked. Some of the smaller collections—typically our Service Hubs who aggregate metadata from many smaller institutions with fewer record counts—saw major growth, as well. These include The Portal to Texas History, North Carolina Digital Heritage Center, Mountain West Digital Library, Digital Library of Georgia, and Digital Commonwealth. Considering that these Hubs are sometimes working with partners who count their collections in the hundreds of records, a growth of 78,000 records (North Carolina’s numbers this year) can be a massive undertaking.

Top ten Hubs by overall record growth

1. The New York Public Library 1,154,051
2. HathiTrust 504,816
3. Smithsonian Institution 184,702
4. The Portal to Texas History 163,810
5. Internet Archive 119,582
6. North Carolina Digital Heritage Center 77,635
7. Mountain West Digital Library 66,699
8. Digital Library of Georgia 65,074
9. Digital Commonwealth 47,843
10. ARTstor 46,158


It’s also helpful to look at growth a second way—by percentage increase—to see where some of our smaller Hubs have had significant growth in comparison to their size. While NYPL still tops the second list, small (but mighty!) South Carolina joins this top ten list with a 33% record increase since 2013. And, Digital Commonwealth moves up five places in this list with a 62% growth rate to beat out all other Service Hubs.

Top ten Hubs by percentage growth

1. The New York Public Library 8051%
2. ARTstor 453%
3. Internet Archive 99%
4. Digital Commonwealth 62%
5. The Portal to Texas History 47%
6. North Carolina Digital Heritage Center 42%
7. Digital Library of Georgia 33%
8. South Carolina Digital Library 33%
9. HathiTrust 29%
10. Smithsonian Institution 26%


Item Formats

So, what types of records are we getting in our collections from these nearly 1,400 partners and 23 Hubs, and how has that changed over the past year?

First, let’s look at the formats of the digital objects described by the metadata records in DPLA. This chart provides a comparison of the percentage in 2013 and in 2014.

Media type

% of collection 2013

% of collection 2014

Text 67.64% 51%
Images 32.08% 48%
Moving images 0.12% 0.27%
Sound 0.11% 0.11%
3D 0.03% 0.13%


You’ll see right away that this year there are nearly as many Texts as there are Images, reducing that difference from 15% to 3%. In fact, Images have overtaken Texts! While not surprising, the difference in size between Texts and Images compared to the rest of the collection is significant. The reality is that it is far more expensive and time consuming to digitize and create metadata for moving images, sound, and 3D objects, and this is especially a challenge for our small and mid-sized (and often under-resourced) partners. Still, it’s an area that is poised for more growth in 2015. We’re on our way already with 3D and Moving image collections, though, which both have more than doubled in size since 2013.

Item Languages

Finally, a comparison shows the growth in the diversity of languages represented in DPLA has also been substantial. In 2013, we reported that there were about 400 languages represented in the Texts in DPLA. Today that number is nearly 500. As one might expect, the majority of what you’ll read in DPLA is in English, but there’s also a fair number of German, French, Spanish, and Latin texts. Here’s a list of the 25 most prevalent languages found in DPLA:

Top 25 in 2013


1. English 72.68%
2. German 8.39%
3. French 6.94%
4. Spanish 2.86%
5. Latin 2.44%
6. Italian 1.65%
7. Russian 0.76%
8. Dutch 0.50%
9. Chinese 0.37%
10. Portuguese 0.32%
11. Arabic 0.28%
12. Swedish 0.28%
13. Danish 0.28%
14. Ancient Greek (to 1453) 0.21%
15. Japanese 0.21%
16. Hebrew 0.18%
17. Polish 0.16%
18. Modern Greek (1453-) 0.13%
19. Norwegian 0.13%
20. Czech 0.12%
21. Hungarian 0.12%
22. Ottoman Turkish (1500-1928) 0.12%
23. Persian 0.08%
24. Armenian 0.07%
25. Croatian 0.04%


Top 25 in 2014


1. English 74%
2. German 7.78%
3. French 6.76%
4. Spanish 2.51%
5. Latin 2.09%
6. Italian 1.88%
7. Japanese 0.91%
8. Russian 0.85%
9. Dutch 0.52%
10. Chinese 0.40%
11. Danish 0.26%
12. Swedish 0.27%
13. Portuguese 0.35%
14. Arabic 0.21%
15. Hebrew 0.22%
16. Ancient Greek (to 1453) 0.18%
17. Czech 0.17%
18. Hungarian 0.15%
19. Polish 0.16%
20. Ottoman Turkish (1500-1928) 0.10%
21. Norwegian 0.12%
22. Modern Greek (1453-) 0.12%
23. Persian 0.06%
24. Armenian 0.06%
25. Icelandic 0.05%


Note the significant jump in Japanese texts over the past year (number 15 in 2013, number 7 in 2014).

It’s great to look a back on a year that’s seen so much success in our attempts (and our partners!) to grow our collections in ways that better represent the variety of stories there are to tell in American society. But, we also know how important it is to continue the momentum to complete the partner map so that institutions across the United States and the stories they tell have a place in DPLA. Let’s all meet up here again in 2016 and see how we’ve done.

Upwards and onwards!


Featured image credit: Detail from “Ojibwe women holding sticks to play Double Ball, Grand Portage, Minnesota,” ca. 1885. From University of Minnesota Duluth, Kathryn A. Martin Library, Northeast Minnesota Historical Center Collections via Minnesota Digital Library.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.