API software upgrade improves security and performance

By Audrey Altman, October 19, 2022.

Digital Public Library of America recently launched a new application to run a core part of its infrastructure, the DPLA API. This software upgrade improves security, performance, reliability, and privacy for our users. It also allows developers to adapt to new services and maintain the code over time. The DPLA tech team transitioned to the new application without change or interruption in service to our users. 

The DPLA API powers search functionality on the DPLA website, the Black Women’s Suffrage Digital Collection, and some of our partner libraries’ websites. It also provides free, public access to DPLA’s data for researchers and application developers. The old API application, which had been in service since 2018, was becoming increasingly difficult to secure and maintain.

One of the main goals of the software upgrade is better security.  Our old API depended on outdated libraries that were increasingly susceptible to security risks.  Rewriting the API on a robust, well-maintained platform ensures much stronger security.  The new application also has stricter HTTP security policies, and we have deprecated JSON-P, a legacy feature that created potential security vulnerabilities for developers using our API in their applications.

The new API software also has improved performance over the old API.  The application is built with Akka, a toolkit that supports strong concurrency.  Concurrent operations (computations that can happen simultaneously) execute much faster sequential operations.  In our tests, the new API performs ten times faster than the old API. This means that we can handle more requests per second, resulting in better service for users.  It also means that the overall load is reduced, which conserves resources.

Using the Akka toolkit allowed us to write the new API in the Scala programming language, a change from the old API which was written in Python. We already use Scala for most of our backend applications because its style and functionality lends itself to the kinds of data processing that we do on a daily basis. Writing the new API code base in Scala will make it easier for our team to maintain. 

The new application has a modular design, in which different logical components are encapsulated and configurable. Modular design makes it simple to adapt the application for multiple uses, which will be useful for future projects. It also helps developers isolate problems, reason about the system, and respond to changes in external services such as databases and search indices.

During the process of deploying the new API, we realized that one of the tools we had been using to test the API had undergone some significant changes and no longer fit our needs. Migrating our suite of integration tests to a new, easy-to-use framework makes the system more reliable. Now, we can check that the API is working in harmony with other critical applications more regularly, and get quick insight into failures. 

Finally, new approaches to logging and analytics tracking give more privacy to our users. While DPLA has never exposed personally identifiable information about users, these new methods make it extremely unlikely that individual use histories could be reconstructed, even by a sophisticated third party.  This is in keeping with the American Library Association’s position on user privacy.

The new API application allows us to continue providing fast, secure, reliable access to DPLA’s data, for our websites and for researchers and application developers. It also prepares us for improvements and adaptations as we look to the future.