Thursday, July 10, 2008

Half empty or half full?

I frequently assert that classicists, along with biblical scholars, share the distinction of using logical citation schemes to refer to the works they study. This practice is important, since it means that references can apply to any version of a work, in print or digital form. (Briefly, in an earlier post.)

I have made this claim so often, that I decided it would be a good idea to find out if it were true.

The TLG offers the largest corpus of ancient Greek, so one way to evaluate how classicists cite their works would be simply to count and summarize the citation schemes used in the TLG. Sadly, athough this would have been possible until 2000 when the TLG distributed data to its licensees, there is in 2008 no way around the preconceived query interface of the TLG web site. (The fact that such a simple question as "what citation schemes are used?" is now out of reach illustrates the catastropic consequences for classical studies of the TLG's decision to reverse its decades-old policy of distributing data, in favor of selling access to predetermined user interfaces.)

As in an earlier post estimating the size of the surviving Greek corpus by period, we can still use the 2000 version of the TLG Canon distributed on the TLG E disk to get an impression of classicists' citation practice, however.

As in that post, we'll want to limit ourselves to works transmitted by manuscript copying. I'll take the simplest approach possible: count the number of "works" that use each citation scheme. I won't attempt to normalize in any way the definition of a work: the five-line Homeric Hymn to the Dioscuri is one work, as is the entire Iliad. With that caveat in mind, let's look at the results.

The TLG E canon includes 3810 works transmitted by manuscript and having defined citation schemes. (Note that the Canon includes works not in the E disk; 584 of these works did not yet have a defined citation scheme at the time of the E disk's publication, so I exclude them from our results.) These 3810 works are represented by an astonishing 194 distinct citation schemes!

As we might expect, however, the distribution of these schemes is very uneven: 104 citation schemes are used for a single work; only 16 citation schemes are used for more than 13 works. Let's look more closely at these top 16 citation schemes, which cover 3426 (90%) of the works surveyed.

Citation schemeNumber
stephanus page/section/line114
jebb page/line54
bekker page/line44
kuehn volume/page/line39
harduin page/section/line32
Total physical schemes1814 (53%)
Total logical schemes1612 (47%)
Grand total3426

The overall results are not encouraging. The entries in black are logical schemes: they total only 47% of the 3426 works. The entries in red refer instead to physical artifacts like book pages, 53% of the group. It's small consolation that the numbers are a worst-case scenario: some works may be cited by both logical and physical reference; where the TLG uses a logical reference, we can be sure that a logical scheme exists, but where the TLG uses a physical reference system, we can't always exclude the possibility that an alternative logical scheme is available. For example, the 44 works cited by Bekker page are, of course, the Aristotelian corpus: many of these have alternative citation schemes by chapter or section.

If we break the numbers down further by the chronological period of the original text, however, the picture changes. With the notable exception of Plato, where Stephanus' great edition became the standard for citation, citation by logical scheme is much more prevalent in works of the classical period. The following table breaks out from the previous listing works dating before about 300 BC.

Citation schemes in works of classical date
section/line 229
line 98
bekker page/line 43
stephanus page/section/line 38
chapter/section/line 20
volume/page/line 18
page/line 16
book/chapter/section/line 11
fable/line 9
book/line 5
ode/line 4
book/section/line 4
tetralogy/section/line 3
demonstratio/line 3
epistle/section/line 3
book/demonstratio/line 2
thevenot page/line 2
epistle/line 2
idyll/line 1
page+column/line 1
sententia/line 1
lexical entry/line 1
proverb/line 1
folio/line 1
fable/version/line 1
exordium/section/line 1
usener page/line 1
Total physical schemes120 (23%)
Total logical schemes399 (77%)
Grand total519

The 519 works are cited in 27 different citation schemes. We could think of that as an "average density" of about 19-20 works per citation scheme, essentially the same as for the overall corpus (194 schemes for 3810 works is also a density of about 19-20 works per citation scheme). But in this listing, only 23% (120) of the classical works use physical reference systems. The corpora of Plato and Aristotle constitute the bulk of this material (81 works); apart from the two great philosophical corpora, only 39 works of the classical period are cited in the TLG by physical reference system – about 8%.

It's probably the height of political incorrectness to suggest that the most traditional canon of work has been the object of better quality scholarly study (although it's plausible enough that more scholarship should produce better results), but by the single, one-dimensional yardstick of how a work is cited, editors of classical texts have done a far better job capturing the logical structure of their texts than have editors of ancient Greek overall.

So for classicists interested in creating a digital corpus of Greek, the "news" is mixed. Roughly half the works in the TLG E Canon already depend on logical reference systems, so we already have a good standard in place for many of our texts. The classical period is in markedly better shape.