Saturday, July 3, 2010

To scan or to type...

...that is the question, as I noted in a previous post.

This morning I typed in four pages.

Then I scanned the same four pages with Omnipage 14 (the OCR program I have).

There's no question that typing 1100 pages by hand would be a tedious endeavor.

But the OCR is tedious, too, and much more prone to errors than simply typing it out.

Several things lead to problems. First, the OCR program has trouble distinguishing between "f" and the 17th-century elongated "s", among some other letter combinations. Second, there's a fair amount of 'noise' (stray marks) which the OCR program wants to read as commas, periods etc. So even if the letter scanning were nearly 100% perfect, I'd still have to go through the whole thing again and fix the punctuation.

Maybe I should forget about trying to produce a machine-readable Latin text, and simply translate from the scans? Or separate the two endeavors, working more on the translation and do transcription as I have time?

Your input is welcome as always...

4 comments:

  1. My input isn't worth much, but why not just translate from the scans? What is the purpose of having the Latin hardcopy?

    ReplyDelete
  2. I always like to have an original language version available--like the Loebs. But it doesn't have to happen; at least not at first.

    ReplyDelete
  3. Enter deep into the woods of the UP with only a jar of ink and quil pen. God will do the rest!

    ReplyDelete
  4. Hey Fr. Gregory. You could ask Roger Pearse (http://www.roger-pearse.com/weblog/). From what I have gathered from his blog, he has done a lot of work with OCR and the like. Hope that helps!

    ReplyDelete