Vitruvian design for scholarship in the humanities: 2012

Tuesday, December 11, 2012

idiôtês

In classical Greek, an idiôtês (ἰδιώτης) is a private individual, as opposed to someone acting in an official capacity as a member of a community. From the unskilled or amateur actions of an ἰδιώτης comes the later sense we use in English.

Everything I post on this blog I do as an idiôtês; sometimes, too, as an idiot. When I vented my frustration with the way PhD programs are failing our graduate students, I made the mistake of trying to present my critique as satire. I should have realized that people currently enduring the horrible stress of the academic job market might misread this as criticism of the job candidates. No one should fault new PhDs for the impossible situation they find themselves in: a better writer than I should still highlight how PhD programs in the humanities are failing their graduate students.

To any one who found my prior post in poor taste or offensive, I apologize. Since comments can seem insignificant when buried beneath a post, I wanted to elevate this note to a post of its own.

Wednesday, December 5, 2012

Advice to new PhDs: how to avoid those unwanted interviews

We all know that a PhD from a highly ranked program guarantees that the world will beat a path to your door for the mere opportunity to speak with you. How can you persuade departments that are hiring not to interview you?

Because my department is currently conducting a search, I have recently surveyed essentially the entire pool of job applicants in Classics, and have a good sense for how the best trained candidates manage to avoid getting interviews. For those PhDs who have not yet mastered such a basic skill, I am summarizing here the strategies I have observed.

Let me stress that these comments are aimed only at a very small fraction of candidates. When I tried to compile a short list of roughly 5% of our applicants for interviews, I was unable to do so. Rephrasing that in positive terms, more than 95% of the candidates successfully avoided an interview based solely on my first fairly cursory reading of their dossier. When you consider the kind of dedicated and talented students who go on to graduate study in Classics, that figure is a remarkable testimony to the powerful effects of graduate school.

First let me suggest three fairly general guidelines:

Do not read the job description.
Find out nothing about your potential home institution and colleagues.
Focus relentlessly on your personal career advancement, to the exclusion of any suggestion that your professional work might affect another human being positively.

These may seem obvious, but after reading many applications, I have a better appreciation for how some candidates apply them most effectively. Remember, even in a field like Classics, you cannot count on letters of recommendation to disparage you adequately: you need to use the parts of your dossier that you can directly control — your CV, any specific essays or statements that an advertisement requires, and, especially, your cover letter — to ensure that you do not get an interview.

Almost all candidates will start with the easiest tactic: send in the same generic cover letter that you use in response to completely different job advertisements. Many candidates will let a bland, off-topic letter speak for itself, but one rhetorical refinement I came to appreciate is the addition of a single sentence or two mentioning the hiring institution, but clearly appended to an otherwise unmodified cover letter. If the appended sentence can raise some subject that is central to the job description, but otherwise unmentioned in the cover letter, it will be especially clear that this is an afterthought, and that you have no real interest in the subject. Even better is the appended sentence that implicitly contradicts the emphasis of the rest of the cover letter. Of course, if you are not confident that your reviewers will appreciate this subtlety, you can always resort to brute force: leave the name of a different institution in your appended sentence. Although I saw this only rarely, it demonstrates to even the most insensitive reader that the cover letter is completely impersonal.

What should you do if your generic cover letter actually responds to some part of the job advertisement? Unlikely as that may seem, it can happen, and candidates will then have to take extra precautions to stay off the interviewers' short list. Use your CV and additional statements to obfuscate or directly contradict any apparently relevant sections of your cover letter. If you allude to a potentially interesting digital project in your cover letter, do not include it on your CV, or else present it on your CV as trivial (e.g., list it under some category like "Other service", beneath a more highly valued contribution such as "ordered pizza for grad student lecture series"). If your cover letter could be misread as referring to collaborative research among students and faculty, expand on that in a separate statement about your research that never mentions students. As I saw repeatedly this fall, it can be especially effective to dwell at length on your contract for a forthcoming book if you emphasize that it will be published by a press charging more per monograph than your library ever pays, and if your topic leaves potential colleagues paralyzed at the prospect of having to read your book when you come up for review.

While many applicants use the protracted and obsessive "forthcoming book" discussion to put off potential interviewers, the possibilities it offers for avoiding an interview are almost limitless. If done properly, you can demonstrate with it that years of advanced study have taught you only a narrow range of technical skills without fostering any kind of development as a thoughtful member of an academic community. Prose style is highly individual, but you can heighten the effectiveness of this trope if you strive for a tone of entitlement. Make it clear not only that the proper role for students and colleagues is to advance your career, but that they should be grateful for the chance.

Perhaps the handful of applicants to whom I am offering these suggestions cannot benefit at this point in their careers: if you have not completely internalized these fundamental habits of thought by the time you receive a PhD, it is highly unikely that you will ever pick them up, and we should recognize that some people may simply be incapable of learning such ideas. But given the nature of the job market in academia today, I feel ethically compelled to share these suggestions. If even one applicant thinks differently about applying for a job because of this post, that will be more than enough of a reward for my efforts.

Monday, December 3, 2012

Does it always sounds better in French?

In January, I posted a rant about my dislike of the marginalizing term "digital humanities." The post has recently been included in a collection of French essays on … you guessed it, "humanités numériques" (digital humanities). The French translation by Marion Lamé and Pierre Mounier (available here) is more than accurate: its polished style seems to me an eloquent proof that thinking about technology can embody the best qualities of a traditional humanistic education. I think I prefer it to the English original.

Maybe this kind of subject always sounds better in French.

Saturday, October 27, 2012

Wait, wait, don't tell me

Two colleagues recently forwarded me a pair of links they thought I would find interesting. One was an MLA job listing for an assistant professor "in American or British literature, 16th-20th century, with interest in the problematic of digital humanities". It included the specification, "Some familiarity with MSWord expected." The other was for a new journal called Digital Philology. The author's guidelines include the invitation, "Digital Philology is welcoming submissions for its 2013 open issue. Inquiries and submissions (as a Word document attachment) should be sent to" ...

I actually had to read each of these twice to realize that one was intended as a parody, and the other is apparently intended seriously. In the spirit of the news quiz on NPR's "Wait, wait, don't tell me," you decide: which one is the parody?

- a job for assistant professor asking for familiarity with MS Word
- a new journal Digital Philology soliciting submissions as emails with attached Word document

If you are unable to tell, the links lead to the full listings on the original web pages, where you'll find further clues.

Welcome to the world of digital humanities and digital philology in 2012.

Monday, October 8, 2012

wikicite?

I recently heard a radio interview with Timothy Messer-Kruse, describing his experience editing the wikipedia article on the Haymarket trial. (He had earlier described the same experience in the Chronicle of Higher Education, online here.)

The striking point is that his edits on wikipedia were repeatedly reverted because they were based on and supported by primary evidence. Wikipedia is, by design, intended to reflect consensus opinions as reflected in secondary publications. This is a horrible inversion of the way history should be studied and presented — one that wikipedia shares with encyclopedias in general, including distinguished specialized encyclopedias like the Oxford Classical Dictionary.

While I don't like encyclopedias, I love the crowd-sourced part of wikipedia. So what if we created a "wikicite" for classical studies? Imagine a wiki where only primary sources were allowed: no reference to any kind of secondary publications permitted. You are of course welcome to read them on your own time, and maybe even learn something from them, but to post to wikicite, you would actually have to work back to the primary sources, and confront evidence you could cite.

That would be revolutionary.

Sunday, August 26, 2012

Mark[up|down]

People sure hate the pointy brackets. I've been writing markup since version 2 of the Text Encoding Initiative Guidelines in SGML in the 1980s, and as easy as modern tools like oXygen make it today, even I'm not crazy about it. Over the past year or so, I've looked at every "markdown" alternative to markup that I could find: to name a few, textile, markdown, multimarkdown, reStructuredText, and more wiki languages than I can list.

All of them seem to have a similar history: somebody wanted to have a quicker and easier way to express HTML, cooked up a tool to convert some simple "markdown" conventions to HTML, and realized, "Hey, this is useful!" As a result, all of the markdown languages share two main drawbacks. The first is that they generally express only the semantics of HTML, or a subset of HTML. That eliminates their application to any writing or editing that requires richer semantics such as an XML language could supply, but more fundamentally, the markdown languages suffer from a long-recognized limitation: they're not specified. It should be obvious that "Whatever my converter tool handles" is NOT a specification, but that's the state of most of the markdown schemes. (See these comments from more than a decade ago about reStructuredText!)

For those cases where I really just want a quick and easy way to bang out HTML-like content, John Grueber's markdown seems to offer the best compromise. First, if you really need some particular piece of HTML beyond what markdown offers, you can just embed it in your text (although at the price of reintroducing some pointy brackets). But what really persuades me is the pegdown processor.
Pegdown uses parboiled's "parsing expression grammars" (PEGs) so it comes closer to a separately specified definition of the language than a code library full of regular expressions emitting some kind of converted text. pegdown will give you an abstract parse tree for your markdown content, which makes me feel much more confident using markdown down from code I write.

Add to that the ever growing number of editors and other tools that support markdown in all kinds of contents, and I'm converted. So was the text of this post — from markdown to html.

Tuesday, July 31, 2012

In a small discipline, proxy repositories

Software builds on other software. With a build system like gradle, once you declare how your code depends on other code, the build system checks your declaration with listed repositories, and downloads appropriate packages as they are needed. If you are coding in a JVM language, you can find an enormous proportion of the libraries you might want from maven central, either directly or via a proxy.

But if you routinely work with ancient Greek, or in any similarly specialized domain, the situation is different. Hugh Cayless' epidoc transformer package is indispensable for my routine work, for example, but for a few minutes yesterday, the one repository where it's regularly hosted was down. I was paralyzed.

The solution is as easy as it is obvious: smaller communities, like those interested in ancient Greek, need to ensure that the collections of material they depend on are proxied and available from multiple repositories.

I'm using Nexus to host material developed for the Homer Multitext project, and yesterday configured it to proxy dev.papyri.info/maven, where the epidoc transcoder is housed. The unified front to all the material hosted and proxied there is http://beta.hpcc.uh.edu/nexus/content/groups/public/.

Nexus is a "lazy" proxy: it only acquires local copies of a proxied package when it is actually requested. One way to guarantee that your favorite proxying site has all the packages you want is with a minimal build, that creates dependencies on everything you might want, and then simply lists their names. The example below is a gradle build to do just this. The repository URL and version strings for packages are kept in a separate properties file, but this example is otherwise complete: running the showAll task will force the proxy server to retrieve any packages it does not already have locally stored.

repositories {
maven {
url "${repositoryUrl}"
}
}
configurations {
classics
}
dependencies {
classics group: 'edu.harvard.chs', name : 'cite', version: "${citeVersion}"
classics group: 'edu.harvard.chs', name : 'greekutils', version: "${greekUtilsVersion}"
classics group: 'edu.holycross.shot', name : 'hocuspocus', version: "${hocusPocusVersion}"
classics group : 'edu.unc.epidoc', name: 'transcoder', version : "${transcoderVersion}"
}
task showAll {
description = "Downloads and shows a list of useful code libraries for classicists."
doLast {
println "Downloaded artifacts (including transitive dependencies):"
configurations.classics.files.each { file ->
println file.name
}
}
}

Monday, July 9, 2012

"Abolish the journals"

I'm appearing on a panel next spring on the subject of "publishing" at the Classical Association of the Midwest and South. Would it be too much to suggest that Walter Olson's critique of law reviews applies equally well to academic journals in the humanities?

Olson quotes Harold Havighurst:

Whereas most periodicals are published primarily in order that they may be read, the law reviews are published primarily in order that they may be written.

Sounds pretty much like the academic journals I'm familiar with in classics.

(H/T: groklaw news picks for the link to Olson's blog.)

Thursday, July 5, 2012

CC licenses for photography of manuscripts

If you're interested in manuscripts of Greek and Latin texts, this week saw a seismic shift in the scholarly landscape. The e-codices project, which has been putting high-quality digital images of manuscripts in Switzerland on the web for several years, has now standardized on a Creative Commons license for all of its images.

In this decision, they are following the lead of a growing number of projects and institutions. I greatly admire the similar work Will Noel has done at the Walters Art Gallery, where high-resolution photography of more than 250 manuscripts is on line, available under a CC license.

Photographed manuscripts now in e-codices number more than 900. Like the Digital Walters Art Gallery, manuscript photography in e-codices is accompanied by a scholarly catalog entry.

The digital archivists are doing their job. Now the only question is whether we can find the scholars of Greek, Latin and other languages to read these beautifully documented texts.

Sunday, June 3, 2012

Who owns Plato?

I attended the workshop "édition des textes et recherche interdisciplinaire" at the École Normale Supérieure last week. As I mentioned in a preceding post, I'd been thinking about Eben Moglen's talk "Innovation under Austerity," and since I expected that introducing Moglen's argument might be a bit provocative for the traditional audience I expected at the ENS, I cleverly thought I would win them over, or at least delay their criticism, by paraphrasing one of Moglen's memorable soundbites: "No one owns Plato."

Not so clever. Apparently, when you gather in the august Salle des Actes at ENS, you can meet people who believe they do own Plato, and don't care to share with others who fall short of their standards, thank you very much.

(In the foreground, keynote speaker Gregory Crane, director of the Perseus Project defensively photographs the photographer; partially masked by the screen are the plaques on the walls of the Salle bearing the names of such distinguished scholars in many fields as Louis Pasteur and Fustel de Coulanges.)

Just for fun, I googled the phrase "plato download": as the screen grab illustrates, google estimated something over 17 million hits for that phrase, including texts in Greek and translation in a variety of languages, podcasts and ebooks (as well as downloads of software packages named after the son of Ariston). I also found the Wikipedia article on Ruhollah Khomeini noting that Khomeini considered Plato's views "in the field of divinity" to be "grave and solid". (Since some of the would-be owners of Plato also object to Wikipedia, I can pass along its reference to Kashful-Asrar, p. 33 as the source of that assertion.)

So while I can appreciate highly theorized concerns about the preparation needed to appreciate Plato "properly", the Anglo-Saxon empiricist in me looks at these Google search results and still wonders — just who exactly owns Plato?

Let them hack: Eben Moglen on "disintermediation"

If you have not yet heard Eben Moglen's talk from last week's "Freedom to Connect" conference, with the title "Innovation under Austerity," it's worth listening to every minute of this audio recording including the Q&A session.

I had it on my ipod as I travelled to a conference to show off work students at Holy Cross have done over the past year for the Homer Multitext project, and was struck by how much of Moglen's main thesis is applicable to digital scholarship. He almost implied that innovation naturally happens under conditions of austerity; he unambiguously argued that the best way to promote innovation is to let young people hack on real problems, and get out of the way.

In the Homer Multitext project, we're learning how to let young people hack on real problems reading unpublished or incompletely published manuscripts. This is not an easy lesson to grasp if your traditional training, like mine, has conditioned you to believe that this kind of work was granted only to the most senior and experienced scholars who had earned the privilege of access to real problems. "Disintermediation," to use the jargon-term quoted by Moglen, may not look appealing to those of us, like museum curators or professors, who have been doing the mediating between young people and real research problems in the humanities. But in the audio recording of Moglen's talk, I think I can hear a little of the excitement I feel every single day I work with and learn from my 18- to 21-year old colleagues on the Homer Multitext project.

pull; update

Les Arènes de Lutèce (the Roman arena of Paris) is not much of an archaeological site, but it's a lovely French park, surprisingly peaceful despite its location in the bustling 5e arrondissement. A group of eight or ten men and women, mostly of a certain age, is silently practicing Tai Chi behind me; opposite us, French school children are clambering over every visible surface and cheerfully pushing, shouting and generally attempting to terrorize each other. This is not Worcester, Massachusetts.

When I last sat here to soak in the sun more than 20 years ago, the scene was visually and aurally identical, but today I have in my laptop a computer that weighs less than a kilo, connected to the internet because public parks in Paris give you two hours of free wifi. The seven busy researchers in the St. Isidore research lab at Holy Cross all use mercurial for version control of their work, so I've run hg pull; hg update, and have seen every change they've committed in the day or so since I last had time to look.

Juxtaposing geographic distance with the immediacy of electronic contact may seem like a pretty tired cliché in 2012, but working step-by-step through the progress of a team thousands of kilometers away makes me realize how little we've thought about a fundamental question: how do we make our research reproducible? Version control systems like mercurial or git are one important part of the technological puzzle, but they don't by themselves tell us how to organize our material or working practice so that others can easily replicate our work as fully automatically as possible.

I'm introducing a new tag "RR" for the theme of "reproducible research" since I think that is arguably the biggest overarching challenge of architectural design in digital scholarship today.

Wednesday, February 8, 2012

Digital scholarship must be technology-agnostic

As smart phones and tablets assume an ever-larger role in browsing the web, “responsive design” has become a hot topic among web designers. How far is it possible to design a single web site that can adapt its display depending on the characteristics of the reading device? Are there times when it’s simply necessary to maintain separate resources for phones vs. large-screen computers?

Designers of digital scholarship face even more demanding requirements. We know that we will replace our digital technologies, but it’s part of our responsibilities to preserve and transmit the scholarly record we work with. Our predecessors have not always set an ideal example for us. The work of Hellenistic scholars of the Iliad like Aristarchus of Samothrace was originally composed for papyrus scrolls. By the time of our earliest complete manuscripts of the Iliad, the tenth and eleventh century, the standard form of “publication” was the codex, or manuscript book. In a large codex, the wide margins offered invitingly convenient space to annotate the Iliadic text with selected notes from earlier scholars, as we see in the famous Venetus A manuscript.

(See interactive version)

As a consequence, virtually all ancient scholarship on the Iliad ceased to be copied as separate texts, and is today known to us only from the snippets preserved in these marginal notes, or scholia. The convenience of this early “hypertext” technology led directly to the loss of important scholarly work.

This illustrates a fundamental and somewhat paradoxical principle that should guide all our work on digital scholarship: it must be technology-agnostic. Well designed digital work will be machine-actionable, but will also be capable of expressing its content when moved to other media, even non-digital media.

One area where we must apply this principle rigorously is in our citation practice. It is tempting to yield to the convenience of using a URL to refer to on-line work: after all, with a URL we can immediately see some kind of response in a web browser.

But this convenience is as dangerous as the medieval scribes’ use of the margins of manuscripts for scholia. URLs are addresses: they will change or vanish; more fundamentally, the web that they point to will ultimately vanish (and, on a time scale that looks back to Aristarchus of Samothrace and other scholars of the library at Alexandria, it will certainly vanish sooner rather than later).

I’ve worked over the past several years with colleagues at the Center for Hellenic Studies to develop a URN notation for citing texts. (Some formal documentation is beginning to appear here ) URNs offer a formally specified notation for referring to some kind of resource, without reference to any particular technology. One of my favorite examples is the ISBN, which can be expressed with URN syntax. Many computer applications work with ISBNs: sales clerks in book stores read them with bar-code scanners, and you can search Amazon or bookfinder.com by ISBN for example. But until a few years ago, I routinely filled out request forms at my college bookstore by hand-writing ISBNs on a paper form, and they functioned perfectly well in that analog environment.

The Canonical Text Service URN (or CTS URN), like an ISBN, is a formally specified machine-parseable reference, but at the same time a simple text string that can be read by human beings and used outside of a digital environment. I have successfully disseminated URNs using chalk on blackboards, and pen on the back of a napkin. But since a CTS URN is also machine actionable, it can be passed in to a Canonical Text Service to retrieve cited passages of text. When our form of citation is not tied to a specific technology, we are free to imagine previously unforeseen re-uses of that material. Would it be handy if the printed copy of a book you want to carry with you were augmented with URNs represented as QR codes you could point your smart phone at to read a cited text? I don’t know, but it would not be difficult to implement. The QR code at the top of this blog entry represents the CTS URN

urn:cts:greekLit:tlg0012.tlg001:1.1

Here is a link passing the same URN to a Canonical Text Service.

Saturday, February 4, 2012

Ancient Greek is broken

It is 2012, and it is not possible to edit an original document from archaic or classical Greece digitally.

The inscriptions recording the construction of the Parthenon cannot be edited digitally; the Athenian Tribute Lists reflecting the annual payments members of the Delian League made to Athens in the fifth century B.C.E. cannot be edited digitally; votive offerings to Apollo at Delphi, dipinti on classical Greek pottery, graffiti scratched by Greek mercenaries on the colossal statues at Abu Simbel in Egypt — none can be edited digitally.

We are prevented from fully and accurately editing archaic and classical Greek by inadequate or erroneous technical standards defining the representation of languages, writing systems and digital character encodings. Unlike Claude Rains’ famously pretended reaction in Casablanca, I am genuinely shocked that most of the standards keeping us from editing classical Greek have been adopted unmodified from recommendations by professional classicists. (Think about that the next time you want to evaluate the state of digital scholarship in the humanities.)

Each of these three shortcomings is worth discussing separately, so I plan to post more detailed comments on them individually, but here is a brief summary of the problem.

1. Language

A text must identify what languages its content represents. We do that with International Standards Organizations (ISO) codes for language. The registration authority for the ongoing work to develop a comprehensive set of three-letter codes for languages is SIL
International.

While some languages codes are organized in families (so that related dialects or languages can be recognized by software to process the contents appropriately), archaic and classical Greek are lumped under a single grc code. (This at least is an improvement on the previous iso639–2 list of codes where Mycenaean Greek written in Linear B could not be distinguished from classical Greek!)

We tell students reading Plato that the text is in the Attic dialect, and would not ask them to consider interpretations that are only possible in other dialects. The string τό, for example, might be a form of the relative pronoun in Ionic Greek, but in Attic it can only be the definite article (“the”).

We should treat our software equally kindly, by encoding explicitly the dialectical variant of ancient Greek used in a text.

2. Writing system

If we are editing an ancient Greek document, we must identify the document’s writing system, since archaic and classical Greek city states used a variety of distinct alphabets. In 403 BC, the Athenians voted to adopt a as their official writing system the alphabet used in Ionia, replacing the Attic alphabet they had used up to that time. The language spoken in Athens did not change, but the writing system did.

The Ionian alphabet is the direct ancestor of the modern Greek alphabet. In this alphabet, the letter epsilon represents a short vowel that is contrasted with a long vowel represented by the letter eta. In the classic Attic alphabet, on the other hand, the two sounds that were distinct in the Ionian alphabet were represented by the single letter epsilon. A glyph essentially identical in appearance to the Ionic eta instead represented a consonant, pronounced like a modern English H (or like the “rough breathing” in modern writing of ancient Attic). Any reader (or any computer program) that tries to interpret a text written in the “old Attic” alphabet as though it were written in the modern, Ionic alphabet will fail spectacularly, even though the language is unchanged.

ISO standard 15924 defines codes to identify the writing system of a text. The current version includes no way to distinguish archaic and classical Greek alphabets from the alphabet of modern printed texts.

3. Digital character set

Once we have identified the language and the writing system of our text, we have to record its contents. The Unicode consortium defines the standard that is by far the most comprehensive and widely supported digital character set today.

Of the sections of the Unicode specification that I have looked at closely, few are as misconceived as the ancient Greek section. I’ll save a fuller catalog of its problems for a separate post, but can briefly contrast one example of the clean design of the Arabic section of Unicode.

In Arabic, a single letter might have distinct forms when written separately, initially, medially or finally. A free-standing letter kaf ك looks quite different from the first letter of the word “book”

كتاب

for example. Software following the Unicode specification can represent all instances of kaf with the same code point: the different letter forms are treated as presentational variants depending on the position of the letter in relation to other letters.

Now use this tool to search the Unicode specification for the term “sigma”. We have two distinct upper-case sigmas, and no fewer than three lower case sigmas, with a lunate form and a terminal sigma being given distinct code points.

While medial and terminal sigma are, like the different forms of Arabic kaf, contextually determined variant glyphs, lunate sigma is simply a font choice used by editors who do not wish to distinguish a final form of sigma from other forms (often because they are editing fragmentary texts like papyri where it might be difficult to decide where word breaks occur in a handful of isolated letters). In all cases, an editor should be able to encode a simple sigma, and searching or parsing of the digital text would work on any form of sigma, while publishers who preferred the papyrologists’ lunate form of the letter could use a font with that glyph for sigma; publishers preferring a text with the two traditional print forms could use a font with a variant form of
terminal sigma.

Because of the false definition of lunate sigma as a distinct character, however, you now have to check manually for lunate forms of sigma versus other forms of sigma if you want to parse or search a text encoded in Unicode Greek. Do you want to do that? Do you want to rely on the authors of your software having to do that?

Solutions?

International standards processes are slow. While it’s reasonable for standards bodies like ISO to rely on the recommendations of professional organizations with expertise in a specific domain, in a field like classics this can be problematic. The American Philological Association is a professional organization often thought to represent the field of classics, but its role in recommendations to international standards like the Unicode consortium, and its complete
absence from discussion like the ongoing revision of international language codes suggest that, because of the what I’ve called the recursive arithmetic of tenure, it institutionalizes conventional wisdom and obsolete assumptions, and helps sustain cargo-cult scholarship.

But in recent months we’ve seen example after example of traditional institutions that have been overtaken by motivated groups using the internet to organize. Can we form enough of an on-line community to move better standards through ISO and the Unicode Consoritum, in alliance with or independent from existing professional groups?

Friday, February 3, 2012

Unplanned reuse

There’s really only one thing you can do with a book: read it. You can learn from it, cite it or feel that your life has been changed by it, but you can’t directly reuse it (well, apart from making it an
accessory piece of furniture, but that doesn’t make use of the contents of the book). One of the distinctive differences of digital scholarship is that, if it is well designed, it can be used for purposes the original author may not have foreseen. The original author may even discover unintended reuse for digital work, as I did recently.

I had been working on an image service using a URN notation to retrieve and view images of the famous Archimedes Palimpsest. Using a URN like

urn:cite:hmt:chsimg.081v–088r_Arch03v_Sinar_pseudo_no-veil

the service lets you do things like

Retrieve a binary image at a given size. . This is bifolio 81v–88r at 50 pixels wide.
Retrieve a region of interest . This extracts from the same image a region with a mathematical figure, the construction of Archimedes, Floating Bodies 1.proposition.1
open a pannable/zoomable version of the image in a web browser, either with or without a highlighted region of interest. Try these two links to the same bifolio illustrated in the static images above:
1. with no highlighted region
2. including highlighting of the mathematical figure

For a course I taught in English translation, I put together a text service, allowing you to retrieve passages of text by canonical reference. With a URN like this

urn:cts:greekLit:tlg0552.tlg008.chs03:1.proposition.1

the service lets you retrieve archival XML source for a passage. This request gets the XML source for Archimedes, Floating Bodies, postulate 1 — not necessarily a thing of beauty to the casual reader of Archimedes. But it’s trivial to associate an XSLT stylesheet to format the archival XML for reading in a browser, so here is the same passage associated with stylesheet for easy reading.

At some point, the penny dropped, and I realized it would also be trivial to mash up the two services. When I started work on the image service, I had not imagined that the digital images of the Greek palimpsest would be of any interest to Greekless readers of Archimedes, but the mathematical figures in the manuscript are extremely important even if you’re reading Thomas Heath’s public-domain English translation.

A minor addition to the XSLT stylesheet uses the markup indicating the presence of canonically identified figures in Heath’s translation to embed references to the image service.

Try this view of book 1, proposition 1, where any reader (Greek scholar or not) now gets to follow the text in Heath’s translation together with images in the only surviving Greek manuscript of Floating Bodies. Images of regions are embedded in the text, and are linked to the zoomable view of the whole bifolio.

Tuesday, January 31, 2012

A checklist for writers

Technology both shapes and reflects our values. What do we value in scholarly writing, and how well do our technological choices match those values?

I look for software that supports four necessary or possible qualities of good scholarly writing:

expository writing should be explicit and unambiguous
the writing process is iterative: good writing only comes from rewriting
academic writing in the natural sciences is often collaborative; this is becoming less rare in the humanities (although not necessarily in the cargo-cult humanities )
born-digital writing should be reusable

In a digital enviroment, to write explicitly and unambiguously means more than choosing our words well: it also means expressing the structure and contents of our writing explicitly and unambiguously. Our writing should embody the fundamental principle of separating concerns in our digital work: our first goal is to express our ideas clearly, not to exercise our typesetting skills, so we need a format that that can explicitly and unambiguously express structure. We might choose an XML-based semantic markup system, or some semantically classed “markdown” system such as markdown or textile. What we should not choose is a “word processor.” Even if you can approximate a semantic structure using a carefully chosen set of “styles” (a tell-tale term!), you will be planting your semantic hints in a thick forest of code focused on the particulars of displaying your text visually. Note that it’s perfectly possible to express this irrelevant information using XML formats like OpenDocument. Our question is not “is this an XML format?” but “does this format express the semantics of my document?”

In considering how to support the remaining items in our list, we should look for examples beyond the humanities, since expository prose is not the only form of writing that shares these qualities. In particular, each is characteristic of good composition in computer programming, and computer programmers routinely use software that directly takes account of each of these qualities.

Programmers use version control systems to work with the entire history of a document to update, restore or compare versions. Version control systems also simplify collaboration, and allow mulitiple contributors to work simultaneously on a document. Changes can be silently integrated and shared; if two authors simultaneously make conflicting changes, version control systems can recognize that, and offer authors options to reconcile conflicts manually. There are many good, freely available version control systems. One reason that humanists are less familiar with them than they should be is that version control systems work best with textual data: the binary formats that word processors produce are a major obstacle to integrating our writing in version control, but once we have adopted a text-based semantic format, that obstacle vanishes, and we have a writer’s desktop that lets us write iteratively and collaboratively.

Programmers also provide a model for reusing our writing. Units of code are often packaged in libraries that other programs use. Programmers working on large projects manage the potentially complex interrelations and dependencies of of different libraries and programs using build systems. We are not yet accustomed to thinking about automating the reuse of our writing, but there is no technical obstacle to doing so. We could use build systems to assemble chapters into a book, incorporate common navigational headers into all the pages on a web site, or automatically update an index if one section of a text changes, to name just a few obvious examples.

So our checklist of required tools for writers includes:

an editor that works comfortably with semantically structured text
a version control system
a build system

I plan to add a series of posts with the tag writing to look at how we can work with tools like these to write more effectively in a digital setting. Meanwhile, take the checklist to your college or university IT department, and ask what specific software they support for semantic editors, version control systems and build systems. I would love to learn of an academic institution that is not just pressing commercial word processing software on its students and faculty, but I don’t know of one.

Monday, January 30, 2012

The recursive arithmetic of tenure

The long career path from college student to a tenured academic job is designed to be conservative. A student in the humanities who discovers a passion for an academic subject in his or her first year of college can expect that four years of college will be followed by, say, six years of graduate school that not only provide training in a discipline, but initiate the student in its culture. The (increasingly rare) PhD who then immediately walks into a tenure-track job typically faces seven years of scrutiny before a tenure decision. Newly tenured professors have proven that their work meets the professional standards of their colleagues — seventeen years after entering college.

Like many professors, I hope that a college education is a formative experience in the lives of my students. Imagine that the newly tenured professor was inspired, seventeen years ago, by an exciting teacher and scholar. That person of course would have climbed the rungs of the same professional ladder, so the youngest tenured professor who could have inspired today’s youngest tenured professor might in turn have first been inspired as a new college student … 34 years ago.

In 1977, the late Steve Jobs was just starting a company he had formed the previous year to sell the computers he and Steve Wozniak were building in his father’s garage.

We’re trying to cross an ocean by standing at the shore and waiting for continental drift to carry us to the other side.

The humanities-that-must-not-be-named

I’m not thrilled with the term “digital humanities.” When people refer to the “humanities,” I think I know what they mean: those disciplines that are concerned with human activity and everything it produces, and take as their task both to preserve and transmit that culture on the one hand, and to understand and interpret it on the other. But what is the sense of qualifying that noun with the adjective “digital”?

In the twenty-first century, the phrase can’t really stand in opposition to an implied “analog humanities”: no such thing exists. (When was the last time anyone submitted a hand-written or manually typed manuscript to be edited with grease pencil before being manually typeset with hot lead?) “Digital humanities” refers instead to scholarship in the humanities that consciously takes account of the fact that we all work digitally now.

What troubles me is that our use of the marked term “digital humanities” implies that the unmarked term, “humanities,” is being used to refer to scholarship that does not reflect on the media we all work in (a usage that is sadly accurate in the academy today). I am particularly disturbed because I would like to imagine that an education in the humanities encourages the kind of critical self-awareness that would enable us to think more meaningfully about our relation to the environment we live and work in, including our technological environment and the ways it is interwoven with our institutions and values.

By using “digital humanities,” we’re allowing the term “humanities” to stand for an uncritical scholarly practice that is at odds with the goals of a humanistic education.

cargo cult plane I can understand why there is not a spontaneous groundswell of support in academic departments around the world for a term meaning “work that unthinkingly perpetuates obsolete forms of scholarly practice,” or “scholarship that is oblivious to the media we use today,” but rather than accept without reservation the marginalizing label “digital humanities,” I’ll offer my own suggestion. We could extend Richard Feynman’s “cargo-cult science” to “cargo-cult scholarship” more generally, and refer to the “cargo-cult humanities.”

Sunday, January 29, 2012

“Digital natives”

I recently attended a workshop at my home institution where I heard teachers confidently assert that today’s students are so adept at technological tasks that we can rely on them to help their older teachers develop important technological skills.

Really?

For more than 15 years, I’ve introduced Classics students at Holy Cross to XML markup. To build on any prior experience they might have, I routinely begin by asking who has ever peeked behind a web page to view its HTML source. Fifteen years ago, I would usually find anywhere from a quarter to a half of the students would say yes. Today, if I ask a group of 20–25 students, I will get one or two “yes” answers.

I do not know if my students were telling me the truth fifteen years ago (or today), but that doesn’t much matter for my present point. Fifteen years ago, far more students either had seen HTML or felt some kind of pressure to pretend that they had.

What does it mean? I suspect that the “digital natives” I teach have indeed grown up so familiar with information technology that they are more oblivious to it than their elders. I worry that they are also incurious, or at least need to learn to be curious about it.

My personal experience makes up only a limited sample, of students in Classics at a small liberal-arts college, but the trend among those students is very clear. Unless someone can show me better evidence, I’ll remain very sceptical about a priori assertions concerning the skills that “digital natives” will confer on their teachers.

Note: new tags

I’m using a couple of new tags on this post: “sceptical” (for obvious reasons), and “yam” [yet another meeting] to help me find posts responding to ideas I’ve gathered from yammering at meetings. I hope to post soon on a couple of additional “sceptical” topics, and several “yam” topics (since January is a big month for meetings in the academic world).