Bleg: OCR for German Fraktur script

Our favorite OCR programs, Adobe’s Acrobat and Nuance’s PDF Converter for Mac, choke on the “Fraktur” or Old German type script that was widely used pre-1945. (Indeed, even though early-20th-Century German typographers were among the pioneers of clean, highly readable sans serif Roman fonts, the Third Reich’s national nostalgia jag seemed to spawn a resurgence of this medieval-looking stuff).

fraktur_sample

If you don’t know what we’re talking about, we want a program that will convert documents written in Old German lettering into editable text. The kind of lettering we mean is the Germanic script used in newspaper mastheads, or on the signs at your local German restaurant, if you’re lucky enough to have one. We want to OCR that stuff, but, “Wait!” as Ron Popeil would say, that’s not all. We want to do it on a Mac. (We only use a PC when absolutely necessary, or when the application program is so cool that it makes the platform irrelevant: check out SpaceClaim).

You might wonder who was ever sadistic enough to typeset tons of documents in this stuff (documents meant for instruction of average-IQ people, no less). We are looking at a 1940-vintage document now, so that probably answers the question. (There are times when you can go there, Mike Godwin be damned).

That document sample looks like it was copied by hand by a gang of medieval monks, but it’s actually from a privately printed manual for German submachine guns.

10 thoughts on “Bleg: OCR for German Fraktur script

  1. LFMayor

    Macs are for hippies.
    But, but, they have great graphics!
    (See if i can get you a big injun/little injun war going, a la .223 vs .308) :)

      1. LFMayor

        You must be one of those 6mm rounds are superior h8ers. And I shave between my eyes, when I shave before my annual “diversity and inclusion” training.

  2. Stefan van der Borght

    Try Tessract. Open source, so you only need to pay me an exorbitant finder’s fee. Works on Mac, even.

    https://code.google.com/p/tesseract-ocr/

    If you ask me to help you decipher German handwriting from that period I’d have to find someone still alive that learnt it, that can still read it, and tell you what it says. They’re pushing 80, so hurry.

  3. Stefan van der Borght

    My neanderthal tech comp got enough of the study downloaded to read the abstract on page 9; which suggests another OCR offering in addition to ABBYY, and a footnote explaining they looked at open source software and thought it might offer additional functionality.

  4. Y.

    IIRC, Nazis did away with fraktur at one point, I believe..

    (goes to check)

    This radically changed on January 3, 1941, when Martin Bormann issued a circular to all public offices which declared Fraktur (and its corollary, the Sütterlin-based handwriting) to be Judenlettern (Jewish letters) and prohibited their further use.

    (from wikipedia)

    1. Wise Cave Owl

      correct, though it took awhile to take. 1943-44 3rd Reich-published books are mostly in Roman alphabet, only a few still being issued in Gothic. The Gothic letters aren’t that hard to deal with, once gotten used to; only a few are radically different, e.g., “B” = ss, etc.

  5. Alex Lund

    Now I make everybody angry. I could read it. OK, I am not a PC/Mac but a human.

    I replaced even the letters that do not exist in english but only in German:

    MP18 I , MP28 II, MP “Erma”
    Die Schliessfeder wirft die durch den Rueckstoss zurueckgeworfene Kammer (Schliessfeder) wieder nach vorne.
    Sie ersetzt gleichzeitig auch die Schlagbolzenfeder.

    I admit: I am a German.

Comments are closed.