I was looking for an open source OCR program online and came across the tip that it is possible to use Google Docs to OCR documents. Followup research proved this was right, but it takes some serious tweaking to set it up so I decided to write up a mini-tutorial for anyone who wants to act on this tip.
1. Log on to Google Docs. If you’ve already got a gmail address you can use the username and password to log onto GoogleDocs.
2. There is a Docs link on the left of the screen. Click it and choose the Drive option on the bottom of the menu or else the top menu has a “Drive” option.
3. When you’re in Drive there is a circle/gear shaped icon on the left the screen. Choose the upload settings on the resultant menu and choose “convert uploaded files to Google Docs format.”
4. Upload a pdf of the document you want to OCR. Right click its link and choose the “open in Google Docs option”. Once it’s open in Google Docs go to the File option on the menu bar. Choose the Download as option. You will be given formats you can download it as: docx, odt, rich text, plain text, web page, pdf. If you choose a text format and you will have the pages ocr’d The original pdf page will show up as an image with the OCR’d text on the next page. I tend to think pdf pages with lots of graphics have the image showup on the output while pages of simple text just have the ocr’d text.
Some things to keep in mind are:
-the output may need some cleaning up but that’s no worse than what you get with most OCR programs.
-I already have an all-in-one printer that I can create pdfs with and this was good enough for Google Docs to work with.
-If you work completely with Google Docs. The OCR text will show up within Google Docs you can simply save it on the drive.
This is trick good for simple jobs, and saves you time from having to type or having to buy an OCR program.