Vitruvian design for scholarship in the humanities: Epidoc transcoding transformer bats 1.000

Wednesday, August 6, 2008

Epidoc transcoding transformer bats 1.000

Hugh Cayless's transcoding transformer library (available from the Epidoc project's sourceforge site here) is indispensable for anyone working with ancient Greek texts in java or groovy. How reliable is it?

I decided to test it against two significant lists of unique Greek strings. For each list, I converted the TLG's beta code word to UTF-8, then converted the resulting UTF-8 back to beta code, and compared that result to the original. (For an overview of the TLG's beta code conventions, see this guide.)

The first list was composed of 858715 words excluding proper names. The transcoder round tripped to its starting point in 858709 cases. Six failures doesn't sound bad (99.999% success rate). But look more closely: in five of the six failures, the TLG entry in fact breaks the TLG's encoding rules about order of accents, breathings and iota subscripts, while the transcoder correctly follows the rules with the consequence that its conversion back to beta code actually corrects a data entry error in the TLG! The sixth case is a sequence found only in a papyrus fragment. The beta code series o(= should represent an omicron with rough breathing and circumflex – an accentuation that is not possible in Greek.

The second word list I tried was composed of proper names, including the tricky sequences beta code introduces in its conventions for capitalization. Out of 53167 capitalized words, the transcoder round tripped perfectly in all but one – again, an error in the TLG data entry that the transcoder corrected!

That's a total of 911882 unique strings. (That's going way beyond carefully chosen unit tests!) Remarkably, the transcoder had a 100% success rate in correctly formed words.

15 comments:

Helma said...: Hi Neel,

We've just started using Hugh's transcoder here in Chicago as well. We requested two tweaks, one to deal with non-capitalized beta code as in use by Perseus, and one to use the 'tonos' Unicode characters rather than the deprecated 'oxia' characters where tonos characters exist. I think that in both cases, Hugh got back to us in under ten minutes with the re-write! A public thank-you, therefore.; August 10, 2008 at 11:54 AM
Jack Mitchell said...: Truly impressive and very reassuring.; October 7, 2011 at 10:22 AM
Assma said...: So while I can appreciate highly theorized concerns about the preparation needed to appreciate Plato "properly", the Anglo dissertation Saxon empiricist in me looks at these Google search results and still wonders — just who exactly owns Plato?; March 28, 2013 at 1:46 AM
Anonymous said...: Very interesting article. It impressed me.

http://www.paperwritings.com/; June 21, 2013 at 4:55 AM
Daily Talk Blog said...: Check out your local schools, theatres, parks, museums or historical help with essay writing society for bricks or tiles to inscribe with your name.; July 12, 2013 at 3:21 AM
Sultan khan said...: Sánchez de Lozada had created his resignation and ultimate leaving from the country. breast enhancement pills; July 23, 2013 at 11:24 AM
Sultan khan said...: I have been learning this topic for a lengthy period. You have provided excellent information in your publish and some aspects I have not seen in other material I have analysis by others. exposedreviews.com; July 24, 2013 at 8:56 AM
Sultan khan said...: In August 1938, German authorities decreed that by January 1, phd thesis writing services 1939, Jewish men and women bearing first names of "non-Jewish.; July 29, 2013 at 12:59 AM
Sultan khan said...: He also had an outstanding a sense of educational composing sites in sydney comedy, always experiencing informing a laugh. Park Slope Lawyer; July 30, 2013 at 5:13 AM
Unknown said...: Not to pat myself on the returning for composing about Obama and his deficiency of interest toward African-american, in addition to its descendents who assisted him get in workplace, I have pushed the problem emphatically what my ip address Click Here.; August 5, 2013 at 8:15 AM
Sultan khan said...: Wow, wonderful blog layout! How long have you been blogging for? you make blogging look easy. breast pills; August 7, 2013 at 1:55 AM
Sultan khan said...: Wonderful landmarks you've always distributed to us. Thanks. personal loans; August 14, 2013 at 11:44 PM
Sultan khan said...: Learning components can consist of guides, classrooms, enjoying toys, charts. Caring for Dresses with Embellishments; August 16, 2013 at 3:30 AM
Sultan khan said...: It doesn't matter if you are very serious problems in the first place. blogging for profits; August 21, 2013 at 8:49 AM
Richardson said...: This comment has been removed by the author.; September 30, 2022 at 5:59 AM

Vitruvian design for scholarship in the humanities

Wednesday, August 6, 2008

Epidoc transcoding transformer bats 1.000

15 comments:

Blog Archive

About Me