The content on this wiki is being preserved for historical purposes, but is not being maintained and is probably no longer accurate.
For current information about DPLA, see the DPLA main site.
Scenarios
Scenarios List
The DPLA prototype platform is designed primarily to enable external developers to create applications using the rich metadata it hopes to gather. Because the aim is to maximize opportunities for innovation, scenario planning is necessarily sketchier than when creating a sw app. So, these scenarios are aimed at addressing a wide range of possible uses others may make of this. These are not applications the DPLA platform team itself plans on developing. Nevertheless, many of the scenarios are quite likely uses of the metadata the prototype platform is gathering. The platform needs to support those likely uses, as well as the unlikely ones we cannot yet imagine.
The “phase” refers to when work on these might plausibly begin. Phase 1: April 27, 2012. Phase 2: September 2012; Phase 3: April, 2013; Phase n: Beyond mortal ken.
| Scenario | Phase | Description |
|---|---|---|
| 1 | 3 | External developer builds an online library browser for children |
| 2 | 3 | External developer creates an commercial library analytics package |
| 3 | n | Local historical mounts a small online exhibition |
| 4 | n | Museum enhances the experience of real-world visitors by displaying information relevant to items on display |
| 5 | 2 | Wikipedia creates a page for each item cited, using some metadata from the DPLA |
| 6 | 3 | Public library wants to share data with the DPLA but lacks technical resources |
| 7 | 2 | A major collection of historic material wants its content to be found when users of the DPLA search for related items. |
| 8 | n | External developer builds an OPAC that incorporates social network elements |
| 9 | 3 | University enhances its library’s efficiency by analyzing data |
| 10 | 3 | University creates a tool to ease the creation of complex online collections |
| 11 | 2 | University enhances its user-generated content with UGC from other libraries |
| 12 | 2 | Independent researcher performs computationally complex analysis using DPLA data |
Scenario 1: External developer: Browser for kids
A developer wants to create a web experience aimed at children K-6. For this she needs a way of selecting items from the DPLA meta-catalog suitable for children, and an application that lets children browse among the items and read/view the items they select.
Concerns and issues
- The DPLA meta-catalog does not routinely contain age-suitability data. When individual collections within that catalog do contain such information, it is in unpredictable formats and has not been verified by anyone except the contributing organization. It is thus difficult to identify suitable content with confidence. Going wrong because of errors in the data -- or, worse, by being hacked -- would be a very bad thing.
- The browser needs to be a fun experience for small children who may be left to use it unaided, especially in the upper age ranges.
- Parents need their own interface for browsing.
- Once an item is found, clicking on it should “play” it appropriately, depending on its media type.
- The content of the DPLA may shift as contributors add, modify, or withdraw metadata.
- The developer would like to contribute data back to the DPLA, including notices that some works are inappropriate, as well as user-generated reviews and ratings (from children and adults).
Solution
The developer begins by selecting reliable, safe collections within the DPLA meta-catalog that have been filtered for child-safe content, possibly using values from ONIX. These may include entire collections from organizations devoted to children’s materials, and collections from public libraries that have marked items as age suitable. Or the application may look up DPLA items in a specialized authority to identify child-safe content. In later versions, the end-user application may enable adult users to suggest works, or use a crowd-sourcing mechanism.
Having identified a sub-collection of child-safe items, the developer writes a web app with two user interfaces. Adults can browse in an informationally dense environment that gives them lots of information about works, including user reviews gathered from services around the Web. Children are given a much more graphically lively environment that allows them to browse, and then click and read/view the content. (The content is delivered via the services of organizations that have contributed their metadata to the DPLA.)
When clicked, items launch an appropriate open source player (book, image, audio, video) and connects to the content at its original source.
The web site allows adults to register themselves and their children. With explicit permission, ratings and reviews are saved and shared. They are also shared with the DPLA. In addition, children are encouraged to create drawings and videos that express their views about the items. Pointers to these expressions are recorded (with the parent’s permission) in the DPLA platform, associated with the items about which they are expressions.
Issues for the DPLA platform
- Many different apps may generate inconsistent data about age appropriateness. Should the platform record that data? If so, how? Are there liabilities?
- Should the Platform be storing, or pointing to, content created by end users about works in its meta-catalog? If so, since some of this data will be created by minors, are there particular liabilities?
Scenario 2: External Developer: Library analytics
A developer wants to create a commercial library analytics package for sale to libraries to help them make better decisions about acquisitions, and to do long-term planning about space allocation, among other uses. The platform’s initial library analytics package is not complete or detailed enough.
Concerns and issues
- The developer will mash up DPLA metadata with data from other sources, some open and some commercial.
- Some crucial data elements need to be fresh.
Solution
The developer downloads the entire DPLA meta-catalog, including event data, and performs some computationally complex work on it. She writes software that interacts with the platform’s API to get recent data, and uses the platform’s federated search capabilities to get real-time data about which items have been checked out that day. The developer combines that data with information from other sites and services to feed a graphics engine that displays information and trends in attractive, comprehensible forms.
Issues for the DPLA platform
- Some usage information generated at the DPLA platform site could be very helpful for certain types of analysis. For example, query logs could be revelatory. What are the DPLA’s privacy policies?
- The information becomes more useful to the extent the items in the meta-catalog have been de-duped and semantically associated, which are technically difficult tasks.
- If this were a non-commercial application, the developer might be willing to feed information back into the DPLA platform.
Scenario 3: Local historical Society: Online collection
A local historical society has been collecting photographs, post cards, maps, journals, and historic artifacts for many decades. It has a small Web site that displays digital views of some items of interest, with a page for each of the past ten decades. It would like to add that collection to the DPLA in hopes that their items will show up in searches, drawing new visitors to the site. They also would like their items to show up in other historical collections. In addition, the Society would like to be able to create and maintain new online collections much more easily. It does not have any particular Web or technical experience or skills, however. The historical Society lives on donations and the part-time efforts of several devoted volunteers, so it cannot afford to hire a Web consultant to do the work.
Concerns and issues
- The current web-based collection consists of HTML pages with no further structure. The images are stored in a directory on their server.
- Although the Society’s images are free of copyright concerns, the Society would be distressed if those images were used in ways it considers inappropriate.
- The Society would like an easy-to-use tool for creating new online collections by browsing their collection of digitized images. The tool should output html pages with the appropriate links, and with captions they input.
Solution
The Society fills out a form on the DPLA site that gives the DPLA permission to import its existing Web site metadata. The form requires the Society to register some information about itself, as well as fill in some metadata about the collection. The Society then presses a button to launch the DPLA’s site scanner that parses the online historical collection. The site scanner uses the structure of the web site as a guide to the structure of the collection, extracts the image links and tries to associate the text on the page with particular images. The Society is presented with a page that shows the induced structure, and the associated images and text which it can then approve or correct. When the information is satisfactory to the Society, they approve it and it is entered into the DPLA platform. (Optionally, the Society’s information might be put in a queue so that it can be reviewed before acceptance.)
Once their collection metadata is in the DPLA, the Society can use the DPLA’s collection building tool that creates a collection from any list of items in the meta-catalog. By confining its searches to items within its own collection, the Society can build a collection only with its own items, or it can choose to include other publicly available items. Likewise, other curators can include items from the Society in their collections. The collection building tool generates an editable structure into which the Society can enter visible (headings, captions, etc.) and invisible metadata. Upon completion, the tool builds a basic set of html pages, and (if desired) uploads the collection metadata into the DPLA. This tool is based upon open source html page construction software.
Alternatively, the Society could choose to download the DPLA platform software and use it locally to manage its digital collection metadata. Doing so would automatically keep the DPLA platform updated as the Society curated its collection. This platform software would enable them to build future online collections, choosing from their own items, and from other items in the DPLA main collection.
Scenario 4: Museum: Enhanced exhibit info
A museum of the history of design wants to enhance visitors’ experience by enabling them to browse and explore information about items on display using their mobile phones. NFC technology sends an item identifier to the phone (or the user types in a number or snaps a photo of a QR code), resulting in a query to the DPLA, which returns information about the item. That information can be explored on the mobile device or bookmarked for later. For example, an Eames chair might return a list of books, videos, and articles about the Eames, as well as a list of other Eames furniture in other collections.
Concerns and issues
- The information returned needs to be useful and well organized. Preparatory work would therefore have to be done, assembling collections of items in the DPLA meta-catalog
- The information needs to be usable on a handheld device, or it should be save-able for later perusal on a more suitable device
- This would require creating a special app for mobile environments -- IoS, Android, Windows 8, etc.
Solution
The museum hires a developer to create a set of mobile apps, starting with the iPhone and Android, that are activated either by NFC or by users typing in a number keyed to the particular item on display. The app then looks up a list of appropriate and related items in QRpedia and in the DPLA meta-catalog; the DPLA lists have been curated by the museum staff. The user can click to explore any particular item, or can save bookmarks to explore later.
The app integrates with social media so that the user is prompted to enter their location, thereby “advertising” the museum. This location prompt also serves as usage metadata about the collection and the item, which (if the user opts in) can be added to the DPLA data.
Scenario 5: Wikipedia: Citation pages
Wikipedia decides to adopt a new strategy (suggested by SJ Klein) for dealing with citations: each cited item will get its own page on which there will be information about the work, all subject to the usual community curation. Wikipedia would like to use metadata about these sources drawn from the DPLA, including bibliographic data and usage (“event”) data. In turn, the DPLA platform could benefit from any corrections to the metadata made by the Wikipedia community, as well as from usage data and information about how the cited works are associated with other works and with the subject categories of pages on which they are referenced.
Concerns and issues
- Wikipedia’s current citation system requires users to re-enter data already entered into previous citations of that work. Entering citations into Wikipedia pages is non-trivial and error-prone.
- Wikipedia is losing the benefit of noting the semantic relationships among pages that cite the same works.
- The citations and their relation to one another, to Wikipedia topics, and to the categories under which those topics are listed is valuable semantic information that could aid those using the DPLA platform to explore the relationships among works.
- Many of the citations will point to items not in the DPLA’s meta-catalog.
Solution
A small group of developers who are part of the Wikipedia community creates a bookmarklet for use on Wikipedia pages that autocompletes citations based on previous citations. The citation generated by this bookmarklet on the Wikipedia page takes the user to a dynamically generated page dedicated to that citation, or to a disambiguation page.
The page generation process uses the DPLA’s API to locate that item. If the item is in the DPLA’s meta-collection, a link to that item and metadata about it are drawn into the citation page. Additional information is collected from Citeseer and Google Books. The citation page is itself editable, using the same basic software as Wikipedia. Users are able to correct errant metadata, and to engage in social evaluation of the cited work. The system contributes corrected and additional metadata back to the DPLA via the API, where it is automatically put into a review queue so that it can be manually inspected to make sure it is not malicious. After a few months of trial, Wikipedia decides to build this functionality into Wikipedia itself, using the developers’ open source code.
Scenario 6: Public Library: Two-way sharing with DPLA
A public library wants to participate in the DPLA, but has little technical know-how in house, and has a limited budget. It would like to display some of the data in the DPLA platform in order to guide its users to works, and if it were easy enough, it would be happy to upload its own catalog metadata and some usage data as well. It is using a well-known, commercial ILS
Concerns and Issues
- The library would like to display usage data for items within its OPAC.
- The library would like to display user-generated content (reviews, ratings, etc.) for items in the DPLA’s meta-catalog, and enable locally produced reviews and ratings to be contributed back to the DPLA.
- It would like to enable its users to browse the DPLA collection for items that the library does not have.
- The library would like to be able to generate a “community relevance” score that guides its users to items of interest. It would also like to be able to show community relevance scores based on a wider set of libraries, e.g., all the participating libraries in the state.
- If it were very easy, the library would be happy to have its collection information (items and events) contributed to the DPLA, and have it be updated regularly.
- The library is using a commercial ILS, and have no plans or budget to move from it for the foreseeable future
Solution
The library downloads a jQuery plugin from the DPLA “app store.” Following instructions written for someone with a low level of HTML skill, the local person responsible for the library’s Web site inserts a small bit of code into the OPAC HTML, and configures two parameters (identifier type and the id of the div that has a book identifier). If the local person does not feel comfortable, a community of developers is able to walk her through the process, via an IRC chat. Once installed, the code automatically uses the identifier (e.g., ISBN numbers) numbers to fetch configurable usage data from the DPLA repository, displaying it in a small, configurable image on the library’s OPAC page. The library takes a similar approach to displaying community relevance based on a set of libraries it has chosen. (The library’s own event data is not available at this point.)
Requiring only a slightly more advanced level of skill, the local library can install a Tvisual browser (Stack View) from the DPLA that shows the currently displayed item on a shelf with other items in the DPLA collection in the same subject categories. To enable users to comment and rate items, the library installs another DPLA widget that lets users enter the data, and then stores it on the DPLA server, either anonymously or using a secure user-chosen identity.
The DPLA tech team works with the commercial ILS vendor to develop a plugin for their software that integrates with the DPLA API, and allows for seamless publication of item, collection and event data. This integration is a competitive advantage for the vendor.
Scenario 7: Major collection: Integrated search results
A national-scale institution has over the decades compiled a major collection of historic materials and wants its content to show up when people use applications that query the DPLA meta-catalog. For example, if a user searches for “Civil War,” this collection would like its rich set of photos from the Civil War to be listed in the returns. It would like this to happen both for its collections and for individual items in the collection.
Concerns and Issues
- The collection actually consists of hundreds of sub-collections.
- The collection metadata varies from fairly rich to meager. The metadata attached to the individual items ranges from minimal to none.
- The collections contain items in multiple media, including images, videos, audio, and maps.
- The collections are fairly static, but new collections are regularly added.
- The collections and items not only should show up when relevant to queries, but their relevancy should be weighted appropriately.
Solution
The institution fills in basic information about itself in order to register itself with the DPLA. It then goes to a web page where it fills in metadata about the collection. This page has instructions on how to create a basic mapping file to map their categories to the simple DPLA schema. It also lets the institution fill in the URL where the data can be downloaded, and creates an update interval. There is also a space for entering the URL where the particular schema is posted, if it is not one of the native ones the platform supports. The institution may also use the metadaata mapping service created by MINT, one of the Beta Sprinters, that produces data in DPLA-native forms. This gives the DPLA platform enough information to harvest the collection metadata.
The mapped metadata is added to the meta-catalog for access through direct API calls, while the unmapped data is maintained for requests for the values of specified keys. The metadata is run through algorithms developed by DPLA partners and on the open Web, as well as via research sponsored by the DPLA, that enhance the value of the metadata by performing some basic, best-effort FRBRization (using OCLC’s facility), uniform title clustering, reciprocal enhancing of collection and item level metadata using UIUC techniques, and semantic clustering perhaps by performing analyses of linked data text clouds.
Using algorithms developed by UIUC, the relevancy weightings are adjusted so that items in small collections are not outweighed by items in larger collections.
Scenario 8: External developer, social OPAC
A university’s library innovation group wants to create an online browser that enables its community to navigate its own library system’s collection, providing social cues to which works are most valuable to scholars, students, and researchers. The items will be shown visually in connection with other works along multiple vectors of connection: works classified together, works in the same or related subject areas, works cited together, works recently checked out, works consulted by people looking at some other work, works popular within particular social groups. The developers want to assess “community relevance” by algorithmically assessing item events such as check outs, extensions, requests for returns, being placed on reserve, being required by a course, etc. They would in fact like to use such information from other libraries to supplement their own information, both to provide greater precision of results, and to be able to bring to attention items used in other institutions that their home institution has not paid much attention to.
Concerns and issues
- The university is willing to share event data so long as it is sufficiently anonymized.
- The computation of “community relevance” can be intense.
- The social information stored by the university’s social OPAC needs to be kept private, and shared only with explicit opt-in by well-informed users.
- The social OPAC benefits from being able to cluster items in interesting ways. New ways of clustering may emerge, which should also be accommodated.
- The university is happy to share its community relevance ranks with the rest of the DPLA
Solution
The university installs a local version of the DPLA platform and imports its library’s catalog via MARC21. It does some custom integration with its ILS, which it shares as open source. It uses the federated search capability provided by the DPLA local platform to query all the libraries in its region for real-time information about the current availability of works. It uses the event schema provided by the DPLA, and extends it to include works ordered by the college book store; they publicly share this extended schema. The university devises several algorithms for computing community relevance, designed for different user types (students, faculty, etc.), and gives each item in its collection a score. In order to factor in event data from other libraries, it does a bulk download of the relevant DPLA data and does the calculation. It shares these scores back to the DPLA via the DPLA platform API so that they are available to other users of the platform; the university updates these scores every three months, using a scheduler provided as part of the local DPLA platform software. The school institutes a privacy policy for the gathering, retention, and use of personal information.
Over time, the semantic clustering of items in the DPLA meta-catalog is enriched by new algorithms, by relationships revealed via linked open data, and by systematic work done by contributors to the DPLA. This information about associations is used to enrich the ways in which the university can cluster items.
Scenario 9: University: Library analytics
A University library would like to find efficiencies in its acquisitions and in the layout of physical books based on patterns of usage. It manages its backend data with one of the well-known commercial library management system. It does not feel comfortable sharing that data with the DPLA, primarily because of security concerns.
Concerns and issue
- The library worries about the security of any data not hosted locally.
- The library’s current backend provides reports but is difficult to use by all those who are involved in making decisions about collections and item layout, and is not configured to report on all the usage (or “event”) data available to the library.
Solution
The University downloads the DPLA platform and installs it locally. They import into it their complete item data in MARC21 format. Using simple scripts, they import usage data, including circulation, items put on reserve, and items called back early from loans. They use a visualization package developed by another university under an open source license that the DPLA platform has listed on its wiki; the package includes installation and configuration instructions that enable the University to customize the visual display of relevant information. Help is provided by the online community.
Benefits:
- The University is able to visualize data to more effectively notice patterns that numeric readouts can obscure.
- The configurations created by the University can be made public for the benefit of other libraries.
- The DPLA gets valuable feedback to improve the analytics package.
Scenario 10: University: Collection creation
A University with existing digitized collections wants to share them with the world. They have created these collections manually, and they are quite sophisticated in their presentation and navigation tools. But because they are difficult to create, the university only invests in creating high-value, elaborate collections. They would like to be able to create collections far more easily, even if it means the presentation quality of them won’t be as high as their premier collections. They’d also like the items in these online collections to interoperate better with collections elsewhere on the Net, so that their online content yields more value.
In this case, they want to quickly pull together an online collection of documents, photographs, and maps to honor a Senator who has just announced his retirement; he has been a special friend to the University. For this, they also want to include some digitized early family photographs in the collection of a neighboring university.
Concerns and issue
- The library does not have developers on staff or the expertise to manage external developers.
- The University is unsure what the interface should be and are concerned about building a custom tool that they have to maintain.
- While the University would like to enable other libraries to include some of its content in their collections, it is concerned about losing control of the metadata and about attribution being stripped from the items.
- The library is interested in participating in the DPLA, but is unsure of what the relationship should be.
Solution
The University is at this point uncomfortable with sharing their data with the DPLA central repository, so it downloads the DPLA local platform. The University is able to import its item and collection data in MARC21 format into its local instance of the DPLA platform.
The University applies to the DPLA for a grant to build a tool for browsing its collection, selecting items for inclusion in online special collections UI. It is given the grant and collaboratively builds an open source UI , based on the Hathi Trust’s open source tool, that interacts with the collection metadata via the local DPLA platform’s API. This tool becomes part of the DPLA code base.
The University is able to locate within the DPLA’s meta-catalog digitized family photos of the Senator and even an old audio file of the Senator giving a school address as a child. The DPLA’s meta-catalog includes the Web addresses of these items, which are located in collections in several local libraries. Their metadata indicates that they are openly licensed, so the University includes them in its special collection, providing attribution and displaying the relevant metadata deposited with the DPLA.
Special collections built with this tool automatically expose the included items in standard ways, so that this content now can be used far more widely.
Encouraged by this success, the University uses Zeega to enable all members of its community to create high-production-value multimedia collections using items within the University’s collection and openly licensed materials found in DPLA’s meta-catalog.
The benefits to the University are:
- They do not have to build or maintain a tool
- They lower the cost of creating special collections and putting them online.
- Their content becomes more widely used.
Issues:
- There is no failsafe way to guarantee that the right metadata and attribution remain attached to the content they have put on line in interoperable ways.
Scenario 11: University: Enhance its user-generated content
The University has only limited user-generated content -- e.g. reviews, comments, ratings -- for its collections, in part because it is a small university, and in part because some of its items are esoteric. It would like to use information developed at and recorded at other universities to provide its users with guidance for a higher percentage of its works, and guidance that comes from a larger sampling of users.
Concerns and issue
- The University needs to be able to identify the libraries it wants user-generated content from. It can do this by issuing a broad specification (“all universities”, “all 2- and 4-year colleges”) or a list of specific libraries, via the DPLA platform API.
- The University would like to be able to display reviews and ratings, and allow its own users to contribute their own. Reviews and ratings from its own users need to be shown independently from those from users of other libraries.
Solution
The University uses a DPLA online form to specify the libraries from which it wishes to draw user reviews and ratings.
The University installs a widget from the DPLA that looks up the current item in the DPLA meta-catalog, retrieving and formatting the user reviews and ratings. The user can create her own review or rating and it is uploaded into the DPLA, either anonymously or in the user’s chosen user name.
Scenario 12: Researcher: Computationally intense analysis
A researcher wants bulk data to calculate the rate of publication by women by race, subject and geographic area over the course of history. The DPLA has a superset of data that makes it an interesting target.
Concerns and Issues
- The data needs to be downloaded locally because the computational needs are high, and to prevent malefactors from taking advantage of being able to run programs on the DPLA’s servers.
- It is up to the researcher whether the results of the analysis will be offered openly or commercially.
Solution
The researcher downloads the data she needs and performs the analysis. The researcher makes the code openly available, and lists it on the DPLA platform’s wiki.
DRAFT confidential