Thursday, December 27, 2007

Open access to federally funded research

Linked from Slashdot today: tucked into the appropriations act just signed by President Bush, a requirement that the NIH must provide online access to research it has funded. This is a tremendous precedent, the first time that the US federal government has made open online access a condition of receiving federal funding for research.

The NIH is the focus, not the NEH, in part because people understand that medical research matters (as the respective budgets of the NIH and NEH also show). But the NIH was also in the Congressional spotlight because of the sustained advocacy of leading scientists, such as the open letter to Congress signed by 25 Nobel laureates in 2004 and by 26 Nobel winners in 2007.

Meanwhile, the American Philological Association, the professional organization that purports to represent classical studies, has inaugurated a multimillion dollar fundraising campaign to establish a "Digital Portal" centered on subscription-based access to a bibliography of print publications.

fuimus Troes, fuit Ilium


In the Ur-web of the early 1990s, images came in fixed sizes. You might get a thumbnail-sized image, a smaller version or a larger version, but generally what appeared in your browser was a full, one-to-one view of a distinct image as it was delivered to you from a Web server.

Today, it's increasingly common for server- and client-side applications to manipulate what is, at least notionally, a single image that a user can navigate through. Google defined the current state of the art in browser-based image navigation when it introduced Google Maps in 2005. Its clever use of AJAX to load adjacent tiles at appropriate scales creates the illusion of continuous navigation of the whole earth.

The same technology can be applied to any image. At University College, London, the Centre for Advanced Spatial Analysis has developed "The Google Maps Image Cutter," an application to generate from any digital image the image tiles required by a Google maps-style web application.

A couple of projects I'm working on apply this technique to browse images that cannot be displayed in full detail in a single view because of their high resolution or awkward shape. The Center for Hellenic Studies' Homer Multitext Project has Google-mapped high-resolution photographs of Iliadic manuscripts. I've recently Google-mapped drawings and photographs of several dozen inscriptions in the Lycian language.

This is an easily implemented and effective way to let users explore an image. It comes at the cost of one tiny little white lie: we have to pretend to Google that the coordinate space of our rectangular image works like a Mercator projection of a spheroid (the earth).

This is innocent enough, if we recognize what we're doing, but it should provoke more serious reflection about how we use images and cite them in scholarly work. We need to define recognizable ways of referring to parts of an image independently of the state of a user's panning and zooming. I'll post more on that topic before long. For now, enjoy the pictures.

Tuesday, December 11, 2007

Vingt ans après

Tonight, several of the Perseus project's original musketeers are gathering to observe the twentieth anniversary of the grant proposal to the Annenberg Foundation that jump-started the project. I'm sure that gray hair, sagging waist lines and altered career paths will prompt private reflections, but here's the fact that grabs me now: the Perseus project is older than three quarters of the undergraduates I teach.

My current students were still toddlers when the first public version of Perseus was released on CD. I doubt any of them have heard of, much less remember, Apple's HyperCard; it will be hard for them to imagine how exciting it was when a hypertext system first became available on personal computers.

They were learning to read or just beginning elementary school when Perseus made its astonishingly rapid transition to a Web delivery system. They probably are unaware that the internet was not always open to commercial use, and have little experience that would help them appreciate the importance of design decisions early in the history of Perseus. Can they grasp how the choice of SGML for markup of texts made it possible to generate both HyperCard stacks and Web pages from a single source?

Now they are in college, and the Perseus project has open-sourced both its code and key data including all its ancient texts (as I observed on Thanksgiving). Will they understand how this opens up to them unprecedented opportunities to build on the work of their predecessors, or have we conditioned them to see themselves only as passive consumers?

Are we raising up a new generation to join in the hard work ahead of us? All for one, and one for all!

Saturday, November 24, 2007


If you study ancient Greek, you can be thankful in 2007. This fall, two of our discipline's most important scholarly instruments have gone through extraordinary metatmorphoses. First, Peter Heslin released version 3 of Diogenes (; then this month, the Perseus project (
announced that source code and text data are being made available under open licenses.

Diogenes now directly integrates automated morphological analyses of ancient Greek from the Perseus project's morphological parser. The Perseus project's new open licenses guarantee that Peter Heslin will not be the last scholar to draw on the rich resources created at Perseus over the past two decades.

Perhaps these developments would be unremarkable in disciplines where contributions through collaborative work and critical assessment of evidence are valued more highly than career advancement. In the humanities, they stand out against a bleak landscape of subscription services and other forms of restrictions on access to scholarly work.

Taken together, Diogenes and Perseus illustrate the kind of cross-pollination that is possible when reuse of digital scholarly works is not outlawed. If enough classicists notice, we may have more good Thanksgivings ahead of us in the future.

Thursday, November 1, 2007

Remembering Ted Brunner

This summer I read the Washington Post's lengthy obituary of Ted Brunner. Few classical scholars are made the subject of so many column inches in a national paper, so I was surprised this fall to discover that none of my Classics students knew who Ted Brunner was. The same quite serious majors who recognized the authors of eminently forgettable footnotes on Greek or Latin texts had apparently never heard of the director of one of the later twentieth century's most influential digital projects in the humanities. We classicists really have a lot of teaching to undo.
I leave it to others who knew Ted better than I to eulogize or analyze him. I offer only two observations from first-hand experience.
First, he remained always relentlessly focussed on data. The TLG was not about producing software: if you wanted software, Ted's attitude was that you should write your own. (Dinosaurs like myself will recall how far he could take this position. In the early years of the TLG, the project's at best arcane, in many ways bizarre data formats were almost aggressively undocumented: you got a nine-track tape, and if you wanted to understand the data, you were welcome to reverse-engineer the format as best you could.) In his own way, Ted Brunner was an early advocate of separation of concerns, and his view has been validated by the range of software developed over the past two decades for using the TLG's data. Most recently, Peter Heslin's release of version 3.x of Diogenes is a stunning piece of work (and deserves far more recognition than it has received). It integrates the TLG data with output from the Perseus project's morphological parser — a piece of software that in turn would probably never have been developed if the TLG had not existed. What a pity that since Ted's retirement the TLG has turned its back on this principle, and permits access to material digitized since 2000 only through its own, one-size-fits-all web interface.
Second, however sharply he could react to people he saw as threatening the TLG's work, he was extremely generous with his time to anyone interested in the TLG, no matter how unimportant. When I was a very lowly graduate student at Berkeley, I had a chance to visit the TLG project at Irvine, and Ted set aside an entire morning to give me a personal tour and answer my questions. (I am sure that I am not the only visitor to the TLG to come away with a vivid memory of Ted starting the standard pre-recorded TLG slide show and proudly pointing out that the narrator's incredible bass voice was none other than the voice of Tony the Tiger.)
So two small points — he focused on his data, and was generous to people who could not obviously or immediately help him.
I hope someone could remember as much about me after reading my obituary.
The Feast of All Saints, 2007.

Wednesday, October 31, 2007

Identifying discrete objects using the dnid scheme

I've been working with Sebastian Heath at the American Numismatic Society on a scheme for referring to discrete objects with unique identifiers.
The fundamental idea is very simple. Since internet domain names already provide a means of uniquely identifying a namespace (think of XML namespaces), we can apply domain names as qualifiers to ensure the uniqueness of existing, stable IDs for many kinds of materials that humanists cite.
I'll have more to post on this topic in the future, but for now, you can see where we're headed at

Vitruvian design

All people, not only architects, are able to appreciate what is good work. The difference between architects and uneducated people is that the uneducated person cannot understand what the work will be unless he has seen it completed; whereas the architect, as soon as he has built it in his mind, but before he has actually begun, has a complete vision of what kind of work it will be in respect to the elegance, the efficiency, and the correctness of its design.

That is my rendering of Vitruvius 6.8.10:
namque omnes homines, non solum architecti, quod est bonum, possunt probare, sed inter idiotas et eos hoc est discrimen, quod idiota, nisi factum viderit, non potest scire, quid sit futurum, architectus autem, simul animo constituerit, antequam inceperit, et venustate et usu et decore quale sit futurum, habet definitum.

Today we are desperately short of architects for scholarship in the era of the internet. In this blog I will occasionally comment on some of my own work, and on other digital scholarship in the humanities that, in my view, is contributing to the construction of a better edifice.