News

State of the DPLA’s Collections: Tracking our Growth

State of the DPLA’s Collections: Tracking our Growth
Posted by Amy Rudersdorf on December 2, 2013 in Blog.
A post from Amy Rudersdorf, DPLA Assistant Director for Content

The Digital Public Library of America launched a little over six months ago, on April 18, 2013, with nearly 2.5 million records from six Service Hubs and ten Content Hubs. This included records from over 400 public libraries, archives, museums, and historical societies across the nation. Today, the collection has surged to just over 5.3 million records from 21 Hubs representing 1,100 institutions. For a full list of our Hubs and a bit more about them, check out our Hubs page at http://dp.la/info/about/hubs/.

We thought the DPLA community might be interested in a few stats we’ve gathered about our partners and their collections at the six-month mark. We’re still building, growing, and testing, but the numbers give some insight into where we’re content-rich and where we might consider focusing our development efforts.

First off, here’s a map of where our Hubs are headquartered. Since April, we’ve been able to color in four new states: Michigan (HathiTrust), New York (Empire State Digital Network), North Carolina (North Carolina Digital Heritage Center), and Texas (The Portal to Texas History). These are not the only states represented in the DPLA, however. In fact, there are multiple records about every state in the US and all US insular areas and freely associated states. (Well, that’s not entirely true. Kingman Reef is only mentioned in one record, but we’re working on finding more.)  To see how your state or county is represented, check out this cool app developed by Chad Nelson: http://dp.la/apps/14.

Clearly, there are too many blue states on this graph, but have no fear! We’re working hard to bring on new states and a some larger organizations and we hope to announce new partnerships (and color in a couple more states) very soon.

With over 1.7 million records, HathiTrust is easily our largest data provider. But, the second (Mountain West Digital Library) and third (Smithsonian Institution) aren’t far behind, and they continue to grow on a near-monthly basis. In addition, four of our Hubs–ARTstor, Digital Commonwealth, Digital Library of Georgia, and the University of Virginia–have all more than doubled their collections since April!

We call our Service Hubs “aggregators” because they represent their community (a state or region) as the single contact point for DPLA. Many of them, like Minnesota Digital Library and the Digital Commonwealth, have been around for a decade or more, and do much more than provide DPLA an entry point into their community’s data.  And, some of our Content Hubs aggregate data, too. Two great examples are ARTstor and HathiTrust.

This leads to an obvious question: what do our Hubs’ partners look like? Well, we’ve created a nice pie chart to answer that question. While the largest slice comes from academic libraries, we’re pretty excited to see that public libraries, museums, historical societies, archives, and government agencies are well represented. And while some of the details get lost in the statistics, like the Girl Scouts and church archives in Minnesota and the Jewish Women’s Archive in Massachusetts, the chart provides a good overall understanding of the type of participation we’re seeing.

We’ll continue work in this area to give each of these organizational types more equal slices of the pie.

So, let’s wrap up with a few other interesting facts that we can discern from the data at this moment in time.

Media Types

Of the over five million records in the DPLA, 87% have a media type designation. A media type defines the general format of the content being described in the records and if a record has one, it will appear in the “format facet” on the top left of the DPLA search results page. Here’s the breakdown of those 4,350,000 records with a media type value:

Media type

% of collection

Text

67.64%

Images

32.08%

Moving images

0.12%

Sound

0.11%

3D

0.03%

Datasets

0.01%

Languages

The text records within the DPLA represent content written in approximately 400 languages from across the globe. This includes 48 Indigenous languages of the Americas. (We’re planning a future post about what we’ve gathered about the diversity within the DPLA, so stay tuned for that.) The top 25 languages represented and their percentage by precedence are provided below.

Languages

% of text collection

English

72.68%

German

8.39%

French

6.94%

Spanish

2.86%

Latin

2.44%

Italian

1.65%

Russian

0.76%

Dutch

0.50%

Chinese

0.37%

Portuguese

0.32%

Danish

0.28%

Swedish

0.28%

Arabic

0.28%

Japanese

0.21%

Ancient Greek (to 1453)

0.21%

Hebrew

0.18%

Polish

0.16%

Norwegian

0.13%

Modern Greek (1453-)

0.13%

Ottoman Turkish (1500-1928)

0.12%

Hungarian

0.12%

Czech

0.12%

Persian

0.08%

Armenian

0.07%

Croatian

0.04%

We hope you can see that there’s a boatload of great content represented in the DPLA from an amazing group of Hubs. But, we know that as we grow we must keep in mind the formats, languages, and organizational types that need better representation in the collections. So, let’s get to work on that.