Digitizing Text, Part 2: Scanning & Saving Multi-page Documents

by JL Beeken on 6-13-2007

If you’ve digitized all the paper you can with OCR, the pile that’s left needs a different solution.

This is the do-it-yourself method for scanning and saving multi-page documents.

Unless you want to type it all by hand, your other solution is scanning and saving it as graphics. If it’s only a one-page document you can use Transcript to type it from the graphic, or just type it from the original.

If you have a 10-page or 40-page document that you want to scan and compile into just one, it takes a bit more.
I don’t want to be sharing documents that look like they’ve been through a tornado, so I put in the extra effort. We’re creating a whole generation of digitized history and it’s a matter of respect I guess.

First, the scanning. I take all the pages and scan them in order into my photo editor. That happens to be Adobe Elements, although I’m sure any one of them will have a similar option. In Adobe it starts with File/Import Scanner, (turn the scanner on first) and they automatically hook up.

After scanning, I straighten and crop, tidy things up a bit with the Eraser tool, and re-size each page. Any or all of these options might be appropriate to cleaning up your pages before joining them together.

Straightening is a no-brainer. Pages are easier to read if they’re not tilted this way and that. With a little experience, you’ll know how many degrees a page is tilted and needs fixing. In the meantime just guess. It’s how you’ll learn. Sometimes the auto-straightener works, sometimes it doesn’t, so you may need this.

Rotate Canvas, Adobe Photoshop Elements

I find the easiest way to crop pages to have a consistent outcome is to use a “Fixed Aspect Ratio” such as 8 x 10. Experiment around to find a proportion that makes sense with that particular set of pages. For instance, if there’s too much white crop some of it off. It doesn’t need a 3 inch margin all the way around.

Fixed Aspect Ratio, Adobe Photoshop Elements

Sometimes it works best to crop close to the text, and then add a canvas back in to re-create the margins. That way the text is aligned virtually the same all the way through. The text on the paper pages should be aligned in the first place but sometimes this is not the case.

Canvas Size, Adobe Photoshop Elements

Regardless of what else you’ve done, the pages still have to be re-sized to the same pixel width. Having your pages all the same width is good so when you string them together in the next step they’ll all line up. It makes them easier to read if you don’t have to use a magnifier to zoom in and out on different size pages with different size text. I find for the purposes of creating a PDF a width of about 1500 pixels will give a good size. Experiment with this yourself. Sometimes you won’t have enough pixels to go to 1500 and you don’t want your text pixellated any more than you would want a photograph of your grandmother looking that way.

Image Size, Adobe Photoshop Elements

Once your pages are all prepared, the next step is to join them together. Before I had a PDF editor I would insert each page (graphics file) into my word-processor, fit each one to the page and then save it as a PDF. Sometimes, for no known reason, the pages would go out of order or disappear while I was trying to arrange them and I’d have to start all over again. With some patience it’s do-able though and if you don’t have a PDF editor it’s a way to go. You may be more adept with your word-processor than I am.

I find my PDF editor much easier. I click the “create a new PDF from multiple files” button, browse for the files, click “Go” and it gets done. If pages need to be added, deleted or re-arranged that’s all possible too at the click of a button.

When you’re finished, this document receives a number and goes into your digital Source Library, ready for future reference or emailing to a lucky recipient.

{ 0 comments… add one now }

Leave a Comment

Powered by sweetCaptcha


Previous post:

Next post: