In digital scholarship, citation is the most fundamental form of ontology. Identifying an entity in a recognized form of citation allows scholars to agree that the object exists, while leaving room to disagree over how to represent it. Once we agree that there is an object I call "the Parthenon in Athens," and can unambiguously cite that object, we allow software to recognize that the same object might be represented in a GIS with spatial data, in a photo gallery with a collection of photographs, or in an architectural database with structured fields of textual data.
For digital scholars, it's hard to avoid talking or even thinking about scholarly reference systems without getting bogged down in technology-specific tar pits. Take the W3C's Resource Description Framework as an example. Its triplet model is brilliantly simple and powerful: whether you think of it as "subject-verb-object" or "object-directed link-object", it is general, extensible, and lends itself readily to both abstract graph models and real machine implementations. Its syntax is expressed in terms of URIs, which could equally well be URLs (real addresses in the "http" schema), or URNs (abstract names in the "urn" schema). (See the W3C's document "URI Clarification" or this discussion "Untangle URIs, URLs, and URNs" to sort out the acronyms.) But when was the last time you saw a scholarly project using RDF to describe relations among objects identified by abstract name, rather than by address? In an RDF graph, "http" is a good URI scheme for an application retrieving material on the internet, but a poor choice for a description of relations among persistent and immutable objects, a case where the "urn" scheme would be more appropriate.
For canonically citable texts, work at the Center for Hellenic Studies on has led to:
- identifying abstract properties of canonical texts
- developing a human readable and machine actionable notation for citation that expresses these properties (the CTS URN notation)
- defining a service for identifying and retrieving texts identified by this notation (the Canonical Text Services protocol)
I think this graded distinction of abstract property, reference notation, and application could be equally valuable in citing other kinds of material. In a following series of posts, I'll look successively at each level to analyze how we might cite uniquely identified objects.