DPLA launches new API, allowing greater efficiency and new opportunities for development

By Mark Breedlove, October 3, 2018.
Published under:

This post is the second in our fall technology spotlight series. In this and each of the posts in this series, we’ll offer an inside look at the development of our recent technology initiatives, including the tools we used, obstacles we encountered, and our thinking behind how and why we have built each new product.

We are pleased to announce the re-launch of DPLA’s API (Application Programming Interface). This marks the first time since 2013 that our API has had a significant software upgrade. We improved our search engine, Elasticsearch, to a current version back in July, and refactored the API software accordingly, but the revision we deployed this month is a complete rewrite of the code. The old “platform” app has been retired and the “dplaapi” app has taken its place.

We should see a number of benefits from this long-awaited update, but I’d like to highlight a few that stand out.

Building on a strong(er) foundation

The first one is a more maintainable, secure foundation for future work. We will be working with a modern language version that is still being updated with security patches, and an application that has many fewer dependencies, and is easier to deploy.  We were previously running with a version of Ruby (1.9) that stopped receiving updates in February of 2015, and was about six years old already when the first version of our API came out. There were many, many dependencies (Ruby Gems), a number of which had also stopped being patched or updated. Because it had an excessive number of dependencies and because it was a Rails app laden with features that we did not need, it was cumbersome and time-consuming to deploy and often difficult to troubleshoot and develop.

The application runs inside of an operating environment, which is also something that needs attention and upkeep. A lot has changed in this realm in the past five years. In 2013, when the DPLA API came out, the Docker project was just releasing its initial version. It was still a new bleeding-edge product that we weren’t ready to take on, but the relaunch affords us the opportunity to utilize what has become a mature and well-documented resource. With Docker containers, we are able to ensure consistent application behavior between development and production with greater confidence and ease that we could have back when we began. For examples of how this works, see our main project README and our development README.

New features coming soon

The combination of Elasticsearch 6 and a code refresh sets us up to add features we’d like to introduce a bit farther down the line, more easily that we could have with the old code. There are features of Elasticsearch 6 that will allow us to add a “more like this” query that shows documents similar to a given one, and search suggestions that introduce terms that the user may have meant instead of the one that was typed. These features simply did not exist in Elasticsearch 0.90. The new application is also structured to allow the easy addition of new API protocol versions (for example an additional “/v3” endpoint, in addition to “/v2”).

New Code, New Keys

We are also replacing our API key database with PostgreSQL, and finally getting rid of BigCouch, which has been kept on only because we couldn’t allocate time to remove it until we could commit to this full rewrite. BigCouch was probably used for the API key database because it was already employed as the backend of our legacy ingestion software. It is a fork of CouchDB that was end-of-life around the time the API launched. It’s going to be a relief to be able to shut down BigCouch and run a modern version of PostgreSQL instead for the API key database. We’ll also save money by not having to run more servers than we have to, since we already have a PostgreSQL service that’s adequate for the small demand that the API key table poses.

Saving Time and Money

If you look at our dplaapi app, you will notice that we are using a suite of Python packages that used for building asynchronous web services, using the ASGI protocol. We’re currently using API Star version 0.5 and Uvicorn, which utilize Python 3’s ‘async’ functionality for non-blocking I/O. (In the near future, we’ll probably use Starlette for the parts that API Star provides, but we’re not tied by our software’s design to one particular library of framework.)  The new architecture is much faster than Ruby on Rails, uses less memory, and should be able to handle significantly more concurrent requests than the old application, with less memory and CPU.

We take our commitment to responsible stewardship of resources seriously and, as such, we like to take any opportunity we can to run smaller or fewer servers to get the same results and save some money. For many tasks, there’s just a right tool for the job, and software that we’ve chosen performs nothing more than its required web API role, without throwing extra work in our way and consuming more resources than it needs to. It has its rough edges, because it’s relatively new, but we’ve tested its behavior carefully and we’re happy with what we see so far. Because of API Star’s minimalism, and because of the application’s structure, refactoring our application to use another framework will not be difficult if we need to switch to something else; compared to migrating something out of Ruby on Rails!

Testing 1, 2, 3

Speaking of testing, it was important to be sure we could make this big leap without upsetting our API’s consumers. Before changing anything at all, including the migration to Elasticsearch 6 in July, we wrote a set of integration and benchmarking tests to assert the API’s established behavior. We ran them against our production services before changing anything to verify how they were supposed to work. We used Postman to create test suites of API requests. These test suites can be checked into version control so that you can keep track of them and deploy them on your Jenkins server, where you can have them run against your production, staging, and test environments. We also used JMeter to benchmark the latency and concurrency of our Elasticsearch servers and, later, our API app servers, to validate whether our expectations of better performance were correct. We’ve retained these tests for ongoing use, and run them periodically against production to check that everything’s still working as expected.

We went paragraph by paragraph through our API Codex and created a test for each promise that it made about the API’s parameters and responses, and also created tests based on what we observed in the response data, and it paid off. The test suite caught a lot of issues early on in development that otherwise would have been time-consuming to deal with during staging and after launch. It’s hard to imagine not having pre-existing integration tests before starting out on a rewrite like this. After the period spent evaluating API frameworks and writing integration tests, the new application was written in one month: the first commit was made on August 10th and a release candidate was issued on September 11.

What’s Next

We hope that this relaunch has actually not been very exciting! The API should behave mostly as it did before, and there should have been no changes required by users. In the future, we’ll have more to tell about new features, but for now we hope that you keep on enjoying the service that the API provides, with the knowledge that it’s coming at lower expense and we’re free now to start implementing new features we’ve been wanting to for some time.

If you have any questions, please let us know at tech@dp.la.