Friday, April 11, 2008

Separation of concerns applies to document content, too

Twenty years ago, before the internet was open to the public, the print publishing industry was a leader in SGML document markup, and scholarly markup projects tended to think of "documents" as the content bound between a pair of covers. This heritage is clearly reflected in the TEI Guidelines' thorough inventory of elements to identify "front" and "back" material of documents, or a variety of groupings or collections of texts.

The major syntactic differences between XML and SGML — insistence on a single hierarchy of elements, each with explicitly marked end — were introduced in part to adapt markup to the needs of a very different environment: a network of computers exchanging information dynamically. The already well-understood distinction between semantic markup and presentational markup certainly contributed to the articulation of "separation of concerns" in the design of network applications. Individuals with different skills could apply appropriate technologies to the different parts of a network application, so in creating an application to run in a web browser, programmers might write the controlling code in javascript, and design specialists define its appearance with CSS. In a network of semantically structured content, XML plays the vital roles of defining the data structure (explicitly via a schema or DTD, or implicitly in the case of well-formed XML), and of providing a format for data exchange. The question of what this XML should look like — the kind of question the TEI has considered since the 1980s — had to be rethought. Humanists might rephrase Sun Microsystem's famous slogan, "The network is the computer," as "The network is the library."

When applications can exchange structured content, it is straightforward to create compound documents. Asymmetrically, it can be more difficult to disaggregate a complex document into component parts, since an application then needs a more detailed knowledge of the internal structure of a necessarily more complex document. An application could easily juxtapose a document in original language with a document in translation, or weave together a commentary with a text associated through a common citation system, for example, but disentangling interleaved translation or commentary from a complex document is more problematic.

I've been thinking about this in designing a set of TEI documents to represent the multiple texts of the famous Venetus A manuscript of the Iliad. There are four distinct sets of scholia, in addition to the manuscript's text of the Iliad. I chose to treat each set as an independent document, and as I am now reaching the stage of putting together applications drawing on those documents, I am glad that I did: cleanly separated, discrete documents are making that job much easier than it otherwise would be.

I expect that I will never use the elaborate TEI mechanisms to document the relation of a transcribed document to graphic images. In keeping with the guiding principle of separate, discrete documents, I'm associating images of the manuscript with ranges of text through external indices: here, too, the standoff markup of a separate, simple (non-TEI) document is easy to marshall together with the TEI document of the transcribed text.

In many ways, TEI P5, with its support for XML namespaces, is nudging scholars towards this kind of document organization. But we need to push harder: it's time to move away from monolithic TEI replicas of print or even manuscript sources. In editing scholarly texts for use on the internet, let each logical component stand alone.

Coordinating separate documents in a networked library requires a common understanding of how to cite them. I'll follow up with a note on how editors of TEI texts should think about that part of their markup.

1 comment:

Majid Ali said...

I request you, do not fall in trap of reading question and answers and do not even think and waste time in this direction. Just read the rush essay service book and confidently go and give exam. You will find nothing has changed.