The content on this wiki is being preserved for historical purposes, but is not being maintained and is probably no longer accurate.
For current information about DPLA Development, see the Development Portal
For latest API documentation, see the API documentation
Metadata upload
Contents |
Content
We're interested in a wide variety of metadata about items found in libraries, museums, archives, and online collections, as well as information about the community use and evaluation of these objects. We're also interested in metadata about the collections themselves, as well as about the items within those collections.
Open metadata policy
The DPLA has an open metadata policy: It will not place access restrictions on the metadata contributed to it. (There may be some reasonable exceptions for metadata designed purely for internal use.) Therefore, please do not contribute metadata that has license or copyright restrictions on it. When in doubt, drop us a line (dev@dp.la); we'd be very happy to talk with you about this.
A note on this early stage of development
The platform will provide an API by which contributors can upload their data, as well as the basic mapping tools so that their data will turn up in generalized searches of the platform's meta-catalog. At this very early stage, however, we are manually uploading and ingesting metadata. This does not require much additional effort by contributors, but we appreciate your patience.
Item and collection data
To make items searchable, it helps a great deal to map them to the simple core item schema the platform is proposing to use.
If you are uploading raw MARC21 records, that will happen automatically. If we have questions, we will contact you.
If you are uploading item records in a different format, perhaps using your own schema, we will need to know how your schema maps to ours. That often is obvious, but we may still need your help understanding, for example, the column headers in your CSV file. (If your CSV file doesn't have column labels, then we'll definitely need a list of what each column stands for.) (Later we will provide mapping tools.)
Event (usage) data
Event data that expresses an action involving an item — it was checked out, it was put on reserve, etc. — is potentially very useful. There are far fewer standards here, however. So:
If the event data uses any non-self-explanatory codes (for example, for patron type), then it would be very helpful to receive the look-up tables explaining these.
In general, if the dataset structure's field names are not self-evident, then a short accompanying note as to what the various fields indicate would be much appreciated.
It is very important that the event data references the items to which it relates. This can be a URI, or standard ID such as ISBN, OCLC number, etc. or a local ID to an item included in an accompanying item dataset.
Thanks!
Anonymizing event data
Note that event data must be thoroughly anonymized before we receive it. That means:
- There must be no user IDs of any sort, even meaningless hashes
- The time stamp must be only at the granularity of the date. No hms info, please
How To
FTP Account to Use
- Address: TK
- Username: lil_dropbox [DO WE REALLY WANT TO USE DROPBOX?]
- Password: dropbox
After connecting, it's necessary to go to the "incoming" directory and deposit files in that directory — that is the only space where the "lil_dropbox" account has write permissions. Also, note that the "lil_dropbox" account is only for uploading files — it does not allow deletions, overwrites or modifications to files once they have been uploaded.
After files have been uploaded by you, we will transfer them to our data server and then delete them from the ftp dropbox.
File-naming Guidelines
Since we have only one account for everyone to use, and all the uploaded files will be in the same directory, it would be very helpful if we could keep things organized by following the file-naming format below, with the listed components, in sequence:
- Institution name or other data-depositor identifier
- Type of data (can be whatever short descriptor you find apt)
- Date, in the format: yyyymmdd
- If multiple files: part number
- File extension (corresponding to file format): e.g., csv, json, xml, mrc , sql, etc.
- Separate components should be separated by underscores.
Examples:
- Data upload consists of 2 files: harvard_item_data_20110525_1.csv and *harvard_item_data_20110525_2.csv
- Data upload consists of 1 file: northeastern_event_data_20110525.json or *boston_public_library_circulation_data_20110525.xml
Formats
Feel free to send along your data in any format you find convenient, including:
- MARC (feel free to send us your raw MARC for item data)
- Tab-delimited CSV
- JSON
- XML
- SQL
- Any other clearly structured format
We're always more than willing to talk with you about format and content and anything related to your transferring data to us, so don't hesitate to get in touch with us at dev@dp.la
Thank you for bearing with us as we build the API, ingestion, and maintenance engines. (And if you want to help with that, please let us know!)