Dynamsoft Document Capture – an online document capture and OCR service – has been launched for a while. The corresponding RESTful service is also available now. In this article, I will unveil how to use the REST APIs to operate image files and perform OCR.
Dynamsoft OCR SDK is totally implemented in C++, which means it is easy to be wrapped in high-level programming languages, such as C#, Java, Python and so on. As a proprietary development SDK, so far, only .NET OCR library is available for commercial use. Because some of developers and users are hoping that Dynamsoft could provide a Java OCR library, I wrapped the C++ OCR library for test. Anyone can feel free to use the sample, and I’d like to receive feedbacks from you.
Last time, I wrote an article Free Online OCR Service with Dynamsoft TWAIN SDKs, which introduced how to create an online OCR application with Dynamic .NET TWAIN SDK step by step. Since then, I have received some feedbacks about how to convert the OCR results to Microsoft Office documents. So today, I’d like to share how to utilize Open XML SDK to convert OCR results to word document. Based on the sample code I shared last time, I’ll make a little bit of change.
Dynamic .NET TWAIN 5.0 has been released for a while. To help users to quickly grasp APIs, a brand-new API demo, written in WPF with C# & VB.NET, is included. In this tutorial, we would like to show the anatomy of this application. Let’s glance at the screenshot as a warm up. As you can see, the functionalities of the demo include scanner control, image loading, barcode recognition, OCR, image manipulation and processing.
How to install Dynamic .NET TWAIN 5.0?
Visit Dynamic .NET TWAIN page and get the installation package by clicking “Download” button. Follow the InstallShield Wizard step by step.
Where is the API demo?
The demo source code is located at “…\Dynamsoft\Dynamic .NET TWAIN 5.0 Trial\Samples\C# Samples\VS 12” and “…\Dynamsoft\Dynamic .NET TWAIN 5.0 Trial\Samples\VB .NET Samples\VS 12\WpfControlsDemo”. You can choose your preferred programming language, C# or VB.NET.
What does the project look like in Visual Studio?
The reference DynamicDotNetTWAIN.Wpf.dll is located at “…\Dynamsoft\Dynamic .NET TWAIN 5.0 Trial\Samples\Bin”. It will be copied to your project folder when you run the Visual Studio solution file “WpfControlsDemo.sln”. If you want to create your own project, don’t forget to add it. In this project, we have created three windows. The main window is “Window1.xaml”.
How to use the relevant APIs to implement following functions?
All implementation logics are same no matter which language you choose. Let’s illustrate with C#.
1. Introduction to OCR add-on
a. What is OCR
OCR (optical character recognition) is software used by a computer to recognize text in a graphic format and turn it into computer text, which can be read and edited normall y. For example, one might take a picture of a car’s license plate, and OCR software could then be used to read the text from the picture into a word document. OCR is implemented through a complex system of trained pattern recognition, whi ch can also recognize fonts and formatting. Modern OCR is very accurate, and thus is practical for use in a wide variety of areas, and is constantly being improved through training and artificial intelligence.
b. The power of modern OCR applications
Computer OCR has been developed for over 60 years. In its most primitive form, it was able to recognize most letters of the English alphabet. Today, OCR is very powerful, and software can be found that is able to support almost all languages in usage, with very reasonable accuracy, and it’s only getting better.
In many cases, the quality of recognition is dependent on the quality of the image. The ideal image is one that has a plain background with a minimal amount of spots and artifacts. However, modern OCR appl i cati ons are also powerful enough to detect anomalies and ignore them in processing. Wordlist data is also used to reduce mistakes, as processed words can be compared to dictionary words.
The Tesseract OCR engine is an example of a powerful modern OCR engine, which supports over 40 languages and is flexible enough to be trained to improve accuracy and add new languages. Tesseract is a mature engine that has existed since 1985, created by HP labs and currently developed by Google. Called an “engine”, it is the lowest level component of an OCR system, meaning its job is to perform recognition and recognition only. To take full advantage of OCR technology and implement features such as output to complex formats, text formatting, and graphical interfaces, a more complete sof tware package is required.
c. How can OCR be used
While the past was a world where documents were all physical, and the future is a world where documents may all be di gital , the present is in a state of transition. In this transition state, physical and digital documents coexi st, and it is important to have technologies like OCR to allow for conversion back and forth.
OCR is useful for a great variety of purposes, including document recovery, data entry, and accessibili ty. Most appli cati ons of OCR are from scanned documents, but in some cases photos are also used. OCR is an essential time saver, as in many cases the only alternative is retyping the document. Some of the ways in which OCR can be used follow:
- Recovering editable text fi les from scanned documents including faxes
- Categorizing forms based on an approximation of their handwritten contents
- Creating searchable and e di tabl e eBooks from book scans
- Searching and editing text from screenshot images
- Computerized reading of books for visually impaired individuals through text-to-speech
While these are just some of the ways that OCR can be used, they show the flexibility of OCR technology in a great variety of fields. Almost all employees of all businesses rely heavily on documents every day, so business usage is al so an important focus in the development of OCR systems.
d. Business applications of OCR
Business usage of OCR generally falls within the field of data organization and input. Many businesses receive documents in a traditional printed form, such as forms that are mailed or faxed in. In other cases, some documents may only be available in written form, such as manuals or printed documents for which the original file has been long lost. Processing of these documents is much more expensive than for documents in a digital form, as they require a human to read the documents and manually categorize or record data.
Usi ng OCR, the manual process is eliminated, only requiring the document to be scanned. Af ter a document has been processed by OCR, its data can be used to utomatically categorize it by the computer, and the information can be edited and searched by employees. OCR is used by post offices, l ibraries, and offices of any kind. Read more