Reading Pictures Optical Character Recognition (OCR)

Reading Pictures Optical Character Recognition (OCR)

Category Rss Feed - http://www.look4articles.com/rss.php?rss=268

By : Amaan Goyal 99 or more times read

Submitted 2009-12-28 07:31:25

In 2004, Google unleashed an ambitious plan under the simple name of Google Print, which was later changed and is now known as Google Book Search (or simply Google Books). The goal: to make the books of the entire world available on the internet. If you thought this was not a scientific task but merely one that required a lot of people to scan a lot of books and upload them, think again. Pause for a moment and think about how it is not possible to search an image file (the typical output of a scanned document) for a particular keyword you might be searching for. Yet, when you type in a key word on Google Books, it is searched for and presented in the uploaded book where it occurs. How do they do it?

The answer to that question is Optical Character Recognition, commonly termed as OCR. OCR is the process by which an image is searched for typewritten, handwritten or printed text and thereby converted into a machine readable/executable text format. The input documents are typically scanned files, in a generic image format or a PDF format. OCR, as in the form that it is available today, has its birth somewhere in the 1950s, when a US Armed Forces Security Agency cryptanalyst invented a machine that could process printed documents into machine readable and editable formats for computer processing. Since then, a number of innovations have been made in the field with simultaneous developments in the field of information technology in general. Even today, OCR is a challenging research field with widespread commercial applications, such as book search and indexing, postal address recognition, conversion of government documents into e archives, and the list goes on.

The primary route taken for OCR processing is structural analysis and pattern matching, in which the different shapes occurring in the image are correlated statistically to the different letters of the language and thereby the closest candidate is selected for output in the machine readable font. Earlier versions of the OCRs used to be specific for a particular font but OCRs today can recognize characters in most of the fonts available for the language.

Some of the popular OCR softwares today include Ocrad, ABBYY Fine Reader, Brainware, and Tesseract, out of which ABBYY and Tesseract offer multi language support. Most of these softwares are licensed and has to be bought to be able to use them. They accept several different types of image format such as JPEG, TIFF, GIF, etc. as well as PDF formats and output the result in a standard text document format.

Even then, most OCRs today are specific to one language (or a few related languages) for which they are tailor made, and this language is more often than not, inevitably English. Online solutions offer the latest in multi language recognition technology of OCR, combined with the provision that you don not have to download licensed software on to your PC. Moreover, it is absolutely free, and the output file is ready for download immediately, without having to submit your email and wait for it to arrive in your inbox.

Author Resource:

Online OCR is the process by which an image is searched for typewritten, handwritten or printed text and thereby converted into a machine readable/executable text format. A common application is converting files from PDF to Text online for editing purposes.

Related Articles

HTML Ready Article. Click on the "Copy" button to copy into your clipboard.

<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'><html><head><title>Look For Articles - Articles Directory | Reading Pictures Optical Character Recognition (OCR)</title></head><body><h3>Reading Pictures Optical Character Recognition (OCR)</h3> By: Amaan Goyal In 2004, Google unleashed an ambitious plan under the simple name of Google Print, which was later changed and is now known as Google Book Search (or simply Google Books). The goal: to make the books of the entire world available on the internet. If you thought this was not a scientific task but merely one that required a lot of people to scan a lot of books and upload them, think again. Pause for a moment and think about how it is not possible to search an image file (the typical output of a scanned document) for a particular keyword you might be searching for. Yet, when you type in a key word on Google Books, it is searched for and presented in the uploaded book where it occurs. How do they do it? The answer to that question is Optical Character Recognition, commonly termed as OCR. OCR is the process by which an image is searched for typewritten, handwritten or printed text and thereby converted into a machine readable/executable text format. The input documents are typically scanned files, in a generic image format or a PDF format. OCR, as in the form that it is available today, has its birth somewhere in the 1950s, when a US Armed Forces Security Agency cryptanalyst invented a machine that could process printed documents into machine readable and editable formats for computer processing. Since then, a number of innovations have been made in the field with simultaneous developments in the field of information technology in general. Even today, OCR is a challenging research field with widespread commercial applications, such as book search and indexing, postal address recognition, conversion of government documents into e archives, and the list goes on. The primary route taken for OCR processing is structural analysis and pattern matching, in which the different shapes occurring in the image are correlated statistically to the different letters of the language and thereby the closest candidate is selected for output in the machine readable font. Earlier versions of the OCRs used to be specific for a particular font but OCRs today can recognize characters in most of the fonts available for the language. Some of the popular OCR softwares today include Ocrad, ABBYY Fine Reader, Brainware, and Tesseract, out of which ABBYY and Tesseract offer multi language support. Most of these softwares are licensed and has to be bought to be able to use them. They accept several different types of image format such as JPEG, TIFF, GIF, etc. as well as PDF formats and output the result in a standard text document format. Even then, most OCRs today are specific to one language (or a few related languages) for which they are tailor made, and this language is more often than not, inevitably English. Online solutions offer the latest in multi language recognition technology of OCR, combined with the provision that you don not have to download licensed software on to your PC. Moreover, it is absolutely free, and the output file is ready for download immediately, without having to submit your email and wait for it to arrive in your inbox. Author Resource:-> Online OCR is the process by which an image is searched for typewritten, handwritten or printed text and thereby converted into a machine readable/executable text format. A common application is converting files from PDF to Text online for editing purposes. Article From <a href='http://www.look4articles.com/'>Look For Articles - Articles Directory</a> </body></html>

Firefox users please select/copy/paste as usual

New Members
	Sign up
	Learn more
	ASK It!


Directory Menu
	Home
	Login to Directory
	Submit Articles
	Submission Guidelines
	Top Articles
	Link Directory
	About Us
	Articles Directory Advertisement Media Kit
	Contact Us
	Privacy Policy
	RSS Feeds

Categories


Accessories
Advice
Aging
Arts
Arts and Crafts
Automotive
Break-up
Business
Business Management
Cancer Survival
Career
Cars and Trucks
CGI
Cheating
Coding Sites
Computers
Computers and Technology
Cooking
Crafts
Culture
Current Affairs
Databases
Death
Education
Entertainment
Etiquette
Family Concerns
Film
Finances
Food and Drinks
Gardening
Healthy Living
Holidays
Home
Home Management
Internet
Jobs
Leadership
Legal
Medical
Medical Business
Medicines and Remedies
Men Only
Motorcyles
Opinions
Our Pets
Outdoors
Parenting
Pets
Recreation
Relationships
Religion
Self Help
Self Improvement
Society
Sports
Staying Fit
Technology
Travel
Web Design
Weddings
Wellness, Fitness and Di
Women Only
Womens Interest
World Affairs
Writing

Actions

Print This Article

Add To Favorites

Privacy Policy \|Advertising \| Contact us
Copyright LOOK 4 ARTICLES FREE DIRECTORY - 2005-2012 - Powered By: HYIP