August 16, 2017
The DPLA is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping us improve our internal workflows and provide better service to our hubs. The Spark OAI Harvester is freely available and we hope that others working with interoperable cultural heritage or science data will find uses for it in their own projects.