Wednesday, June 29, 2011

A DSpace batch load for one item

The Curation System for DSpace (as of release 1.7) includes a Bitstream Format Profiler curation task by default. The task can be performed on any DSpace object (item, collection, community). Operating the profiler on an item examines all the bitstreams in an item and produces a table (profile) that is configured to display in the Admin UI. The result shows the count of bitstreams of the named format in the left column and a letter in parentheses which is an abbreviation of the repository-assigned support level for the format (U-Unsupported, K-Known, S-Supported).

I thought this task would be fun to run on an item that we had to batch load into the Knowledge Bank given the number of files:

The item profiled above, the Índice crítico del teatro uruguayo (1808-1980) [Critical Index of Uruguayan Theater (1808-1980)], contains 2,895 bitstreams. However, as the item is archived as a Web site, only one bitstream (the index.html file) is displayed as a file via the public UI .

The Critical Index of Uruguayan Theater collects the archive produced between 1976 and 1980 by Graciela Míguez (1949-2000) and Abril Trigo. It consists of three interconnected parts: an inventory of authors and playwrights, an index of the theatrical plays attributed to them, and a set of critical-analytical reviews of an extensive and representative selection of plays. For nearly 30 years, Abril Trigo preserved the archive of typewritten records containing this unique cultural resource.

In 2008, The Ohio State University Libraries digitized and indexed the records for presentation on the Web. The Critical Index of Uruguayan Theater was archived in the Knowledge Bank in 2009. Due to the number of files, we batch loaded the item. The archive directory for the batch load contained just one item directory with the dublin_core.xml metadata file, the contents file listing the files to be added as bitstreams to the item, and the 1,448 content files (PDF, HTML, PNG, JPG, and CSS).  The total count of bitstreams profiled above includes 1,443 extracted text files (the 1,443 Plain Text) and 4 thumbnails (4 of the 7 JPEG files) generated post-load by the media filters.

No comments:

Post a Comment