Wednesday, April 17, 2013


"Grand Unification Theory" may be a touch grandiose, but the underlying libraries used in the Homer Muiltitext project now generate RDF statements that fully express all three types of CITE-architecture information:  textual archives, archives of data collections, and indices relating citable objects to other citable objects or to raw data.  There will be lots of interesting connections to explore in the resulting unified graph of scholarly material.

In parallel with this, I've now implemented the CTS protocol, the CITE Collections Service protocol, and its extension with the CHS Image protocol in servlets drawing on a SPARQL endpoint, so creating a complete CITE environment can be reduced to:

- build all RDF (automatically), and import into a triple store
- drop the three servlets for CITE services into a servlet container
- install the iipsrv fastcgi for working with binary image data.  This is the most troublesome step on many platforms, but happily iipsrv is now available as a package under debian.

Not bad.  Chris Blackwell is preparing an image for the < $50 raspberry pi with these requirements preinstalled:  a complete CITE Box roughly the size of an Altoids container.

As we review the schemas used in the services this month, we'll begin looking at defining a more permanent RDF vocabulary.  I'm not sure at this point if we need to break out a generic CITE vocabulary distinct from a specific HMT vocabulary, or whether one ontology will suffice.  We'll be looking at other projects' work:  thanks to Joel Kavlesmaki for pointing to the useful list here.

Sunday, April 14, 2013

CITE Collection Inventory

In parallel with Friday's update to the schema for CTS text inventories, CITE Collection inventories now include an optional urn attribute on the schema for Collections.  Bump your build system's dependency for the cite library up to 0.12.2 to include this change.

As with the CTS TextInventory, we plan to make the Collection inventory's urn attribute mandatory in 0.13, and will drop the parallel name attribute in 0.14.

Friday, April 12, 2013

Updating the CTS TextInventory schema

Scott Mcphee points out the absurdity of a Canonical Text Service (CTS) definition that uses CTS URNs for all retrieval requests, but doesn't include CTS URNs in the service's TextInventory.  The historical explanation for the inconsistency is embarassingly simple:  the TextInventory schema predates the invention of CTS URNs, and has not been revisited since!  That oversight is rectified with today's release of version 0.12.1 of the CITE schemas package.

Ultimately, we want to arrive at catalog entries with urn attributes that look like this:

<textgroup urn="urn:cts:greekLit:tlg0012">
 <groupname xml:lang="eng">Homeric poetry</groupname>
 <work urn="urn:cts:greekLit:tlg0012.tlg001" xml:lang="grc">
  <title xml:lang="eng">Iliad</title>
  <edition urn=":cts:greekLit:tlg0012.tlg001">
   <label xml:lang="eng">Allen (OCT 1931)</label>

With release 0.12.1, the urn attribute is now optional but strongly recommended, alongside the previous projid attribute.  With release 0.13.0, the urn attribute will be required, and the projid attribute deprecated.  With release 0.14.0, the projid attribute will be dropped.

So grab from our nexus repository to get started with a modern TextInventory identifiying texts by URN.  You can manually download a zip bundle from the repository,  or update your maven coordinates with groupId "edu.harvard.chs", artifactId "cite" and version "0.12.1".

[Updated:  bumped version from 0.12.0 to 0.12.1 after adding trailing slash to dc namespace as requested by Bridget Almas]

Thursday, April 11, 2013

How hard is it to imagine "popular scholarship"?

I heard an interesting talk yesterday at Clark University by Robert Anderson, former director of the British Museum, on "The British Museum and Library at the New Millennium:"   wonderful anecdotes from the early history of the museum, and a compelling argument for the essential intellectual unity of what museums and libraries do.

The British Museum Great Court.
Photograph by Eric Pouhier,
licensed under cc-by-sa license.

Two details troubled me.  First, while the rare book library at Clark was filled, I saw only one student, and I probably fell well below the median age of the audience.  The talk was sponsored by the "Friends of the Goddard Library," but if this audience was representative, the library won't have too many friends in a few more years.

Second, both Anderson's talk and some of the discussion afterward made some curious assumptions about scholarship.  As the director at the time of the separation of the British Library from the Museum, and the opening of the fabulous facility at the new Euston Road location, Anderson offered insightful comments on the tensions of an institution committed both to free public access and to serving the needs of specialist scholars.  He brought up a problem familiar to anyone who has worked at the BL recently:  it's such a popular place, that all the desks fill up early in the morning with students looking for a comfortable place to work (with free wifi and good coffee!), but who aren't necessarily taking advantage of any of the unique offerings of the British Library.  This can impose a real hardship on people working on projects that depend on BL material.  Two assumptions emerged in the discussion that struck me as odd:  that the results of scholarly research would only be of interest to a small circle of specialists; and that digital material should be openly viewable, but scholarly research was being well served by a policy that allows free reuse of scholarly material only in print publications with a very limited print run.

Interior of the British Library.
Photograph by Maria Giulia Tolotti
licensed under cc-by-sa license.
Let's parse that logic a little more closely:  scholarly reuse of BL material is OK as long as not too many people care to read it;  and that's fine, because scholars' research is only of interest to a handful of other specialists, and expensive print media are an adequate way to meet this need.  (The host's introduction of Anderson referred light-heartedly, in what was evidently intended to be humor, to the fact that his most recent multi-volume publication costs hundreds of dollars.)

If we think the goal of scholarly research is to produce high-priced monographs of interest only to other specialists, is it really a surprise that the general reading public sees in the British Library a wonderful cafĂ©?  If we think of "digital access" as a way of entertaining or at best informing a wide public, without inviting scholars to build upon the digital foundations of the BL's collections, is it any wonder that visitors to the BL are not drawn to the library's unique resources, but instead spend their time with the amazing hodge podge of entertainment and information that populates the internet?

(Footnote:  I was able to include the photographs by Eric Pouhier and Maria Giulia Polotti, without regard for how many people might view them, because both are available from wikimedia commons under the terms of a cc-by-sa license.)