|DPLA Wiki Navigation|
|About the DPLA|
|Main Page • Berkman Center|
|Board of Directors|
|Audience and Participation • Content and Scope|
|Financial/Business Models • Governance|
|Legal Issues • Technical Aspects|
|Beta Sprint • Workshops • Events|
|Media and Blog Mentions|
|List of Models|
|Community Portal • Sign on|
|Join the listserv • Listserv archives|
|Weekly listserv recaps • Suggested Resources|
This workstream will explore the desired architecture for the DPLA and will make recommendations regarding technology to be used for its development and to build or facilitate building the discovery environment. This track will also examine the state of the art for digitizing books, video, audio, and other types of documents and determining how much it might cost to greatly increase the scope of current digitization efforts.
- Meetings and notes
- May 2011: Global Interoperability and Linked Data Workshop and Notes (as PDF)
- June 2011 Technical Workshop
- October 20, 2011
- December 9, 2011 Workshop
Big issues: The project may be helpful at the level of standards, interfaces, APIs, and so forth for already and to-be digitized content for public usage. It also may be helpful insofar as we can help to nudge toward better access via current digital means of accessing books, including eBooks.
Beta Sprints provide a sense of how people might use or interact with the DPLA.
Chris Freeland, Center for Biodiversity Informatics/Missouri Botanical Garden; Biodiversity Heritage Library
Martin Kalfatovic, Smithsonian Institution Libraries
Workstream conveners are responsible for the recruitment of a broad range of workstream members and the identification of expert participants for DPLA activities over the next two years.
John Blyberg, Darien Library
Aaron Chaletzky, Library of Congress
Tim Dilauro, Johns Hopkins University
Lee Dirks, Microsoft Research Connections
Michael Edmonds, Wisconsin Historical Society
Emily Gore, Florida State University
Jorge Martinez, Knight Foundation
Robert McDonald, Indiana University
Brad McLean, DuraSpace
Carole Palmer, Center for Informatics Research in Science & Scholarship; University of Illinois at Urbana-Champaign
Robert Stein, Indianapolis Museum of Art
David Weinberger, Harvard Library Innovation Lab
Kristina Woolsey, Exploratorium
Pam Wright, National Archives and Records Administration
Questions for Discussion
Special thanks to Carl Malamud for developing an initial list of discussion questions.
- Who is digitizing at scale and what technologies are they using? Is the technology base similar for books and other media or are there digitization silos?
- Are there technologies in research & development or even pure research that hold promise to bring costs down, volume up, quality up, or longevity up?
- Do we have any idea of the magnitude of information to be digitized? For example, are there hard numbers on books, their location, and the overlap among institutions?
- Are we focusing only on digitization technologies or also on the requirements for long-term preservation of digital assets (e.g. migration, emulation and associated costs, some national libraries addressed both in their initial planning and cost projections)?
- Assuming DPLA will be a distributed system, what is the optimal information architecture to ensure interoperability, in the US and globally, with other national libraries?
- What quality standards should we hold mass-digitization efforts to? When is "good enough" good enough?
- What does digitization mean for printed text material? Pictures of pages, sufficient transcription for search, accurate transcription, transcribed text with semantic annotations? How to distinguish between between them?
- Is the DPLA effort about digitization only, or is it also feasible to discuss the collection, preservation, and dissemination of born-digital content?
- What standards will be necessary for efficient creation of, sharing of, interoperability of, and preservation of the content within DPLA?
- What elements might be missing in the standards landscape that would facilitate the DPLA?
- The need and scope for a work identifier which could limit the duplication that might occur from a distributed digitization effort? Is the International Standard Text Code (ISTC) appropriate for this?
- How will the integrity of the digitized documents be maintained? (via Duane Dunston)
- What mechanisms will be in place for researchers and the public to check to ensure books and scholarly papers are the original documents (i.e. the integrity of the original document hasn't been altered)? (via Duane Dunston)
- What metadata standards will the Library adopt?
- What provisions will there be for interoperability with other collections of digital material?
- How important is the World Wide Web and its stack of technologies to the Library?
- Will the DPLA be a host of content or will it be a federation of other content repositories?
- Will the DPLA contact and work with the Internet Society (ISOC) and how will their experience, structures and methods be leveraged/adopted/adapted? For example, the arms of Internet Research Task Force (IRTF) and Internet Engineering Task Force (IETF) and the process of iterating Request For Comments (RFC) leading from first idea to draft/final specifications (and/or standards) with working prototypes along the way. Have experts at ISOC already solved like technical problems DPLA is likely to encounter? For example, the evolution of the Simple Mail Transfer Protocol (SMTP) and Hypertext Transfer (or Transport) Protocol (HTTP).
- Will the DPLA be inspired by the Open Cloud Initiative (OCI), Open Source Initiative (OSI) and Open Source Definition (OSD)?
- How much attention should the DPLA pay to end user issues such as the possibility of the creation of firmware and apps for public library users and alliances and educational officers in the last-mile area? Does access, in the technical sense, count?
- Should the DPLA establish a close relationship with the International Digital Publishing Forum, the leading organization in the area of e-book standards?
- Might the DPLA actually do well to spin off a separate technical organization serving the needs of both academic and private digital library systems, which could be separate?
- Europeana Technical Documents: EDM Primer
- Peter Suber, Open access for digitization projects, July 2, 2009.
- The estimated cost of digitising the total collections of Europe’s museums, archives and libraries is approximately €100 billion, see "The Cost of Digitising Europe’s Cultural Heritage, new report by the UK Collections Trust" (Nov. 2010) http://ec.europa.eu/information_society/activities/digital_libraries/doc/refgroup/annexes/digiti_report.pdf
- Resources on the Economics of Digitization
- Federal Agencies Digitization Guidelines Initiatives
- National Archives, NARA Guidelines for Digitizing Archival Materials for Electronic Access
- Framework for Building Good Digital Collections -- An IMLS-initiated and funded NISO publication. Framework Provides an overview of some of the major components and activities involved in the creation of good digital collections and provides a framework for identifying, organizing, and applying existing knowledge and resources to support the development of sound local practices for creating and managing good digital collections."
- Re: Question 3: Lavoie, B. and Dempsey, L. Beyond 1923: Characteristics of Potentially In-copyright Print Books in Library Collections, D-Lib Magazine, Nov/Dec 2009 http://www.dlib.org/dlib/november09/lavoie/11lavoie.html
- Aaron Swartz, Outline of a Digital Preservation System
- Portico: A Digital Preservation and Electronic Archiving Service
- National Digital Newspaper Program Guidelines
- HathiTrust (technology and policies)
- The European Library: Handbook (Technical and Metadata Requirements, Infrastructure)
All workstream members should join the DPLA Technical Aspects Workstream listserv at https://cyber.law.harvard.edu/lists/subscribe/dpla-tech.
Please also add your name to the list below. If you would like to edit this wiki, please create an account.
- [Your Name], [Affiliation], [Your Email]
- David Rothman, LibraryCity.org, firstname.lastname@example.org
- Todd Carpenter, Executive Director, NISO -- email@example.com
- Tito Sierra, MIT Libraries -- firstname.lastname@example.org
- John Weise, University of Michigan Library -- email@example.com
- Wilhelmina Randtke, Florida State University -- randtke (at) gmail.com
- Stephen Chapman, Harvard Law School Library -- firstname.lastname@example.org
- Lori Jahnke, Medical Heritage Library (College of Physicians of Philadelphia) -- email@example.com
- Zachary Townsend, openNYC -- firstname.lastname@example.org
- Jason Ronallo, NCSU Libraries -- email@example.com
- John Mignault, Mertz Library, The New York Botanical Garden -- firstname.lastname@example.org
- Leah Prescott, Washington Research Library Consortium -- prescott [at] wrlc.org