Saturday, September 19, 2009

Adventures in OCR

As you know, ICE, WIND AND FIRE is going through the scanner at this time (see my previous post about why, and how), and the book will be back out in paperback and ebook form, in time for its own Twentieth Anniversary...

So, the IWF experience is all about OCR.

Now, the OCR process is actually easy; and the software today is virtually foolproof, so long as you scan an a high-enough dpi for the system to know what it's looking at...

The problem is in the hardware.

Hands up, all those people who paid $300 for a scanner in 2002, and having paid that much, don't really want to consign it to the bin?

Now, hands up, all those people who were running Win98 back in those days; and since then they've wandered through XP Home, XP Pro, Service Packs 1, 2, 3, Vista, Service Packs 1 and ... the dreaded SP2.

I have a computer that NASA would have drooled over. This Quad Core, with is 4x 2.4G processors and its 6G of Ram, and its unlimited storage space, would have run Project Apollo and sent Voyager to the planets -- at the same time.

And all it took to knock this paragon of cybernetics flat on its can was --

Vista, Service Pack 2. God alone knows what happened, but as soon as the "update" went through the system forced a re-start for the third time in a row, the computer went haywire. Wouldn't stay online. Wouldn't run the big, complex applications properly. The browser would lock up, and kill the whole thing. Restart (which takes 15 minutes or more). Over and over.

To top it off, Microsoft's security system had, in its infinite wisdom, decided that Lenovo's automatic update facility was a spam generator, and demanded that it be shut off ... so Leonovo was not even able to fix the problem on the fly.

After a week of cussing, bitching and getting very little done, I yelled from Dave from DreamCraft, who went into the system and worked magic on it. Turns out, Vista Service Pack 2 is so bad, Lenovo had to issue a patch to put right the damage; but with the auto-updates turned off at Microsquash's insistence, you had to go to Lenovo and *get* the patch...

Which got the computer back up on its four processors, so I could start to look at the OCR job.

Which brings me back to the point where I started ... if you can remember that far back.

My scanner is a few years old. It worked perfectly under XP Service Pack 1, and even 2. SP3? Nope. Vista? Wellll ... it's sudden death up to a point. To get scans, you have to reinitialize the scanner after every scan.

So here's the routine for the OCR work on ICE, WIND AND FIRE:

Place Page 1 on the scanner
Acquire ... scan ...
20 seconds to get the page into text...
select all and copy;
paste into the open Word document...
UNPLUG the USB cable to the scanner!
Plug it back in.
Twiddle thumbs for 30 seconds...
Place Page 2 on the scanner
Acquire ... scan ...

And by far the most critical part of the process is not the scanning or remembering to turn the page. It's UNPLUG the USB cable, then plug it back in, give it half a minute to reinitialize the scanner, and *then* start the software and acquire the scanner.

Because if you don't, the scanner crashes; it crashes the software; and sometimes the whole thing won't come back up without a reboot on the computer.

Repeat process 242 times.

But it beats the hell out of typing! And I have nothing bit good things to say about a little program called TOP OCR, which is free, and phenomenally accurate. I'm absolutely delighted to give them a plug here. ... they do a range of software, too. Imagine being able to capture text with your digital camera, or your phone --! What won't they think of next?

So ... ICE, WIND AND FIRE is going through the scanner, and the process is simple. It's just a little bizarre.


No comments:

Post a Comment