Wrapping C++ OCR Library in Java

Dynamsoft OCR SDK is totally implemented in C++, which means it is easy to be wrapped in high-level programming languages, such as C#, Java, Python and so on. As a proprietary development SDK, so far, only .NET OCR library is available for commercial use. Because some of developers and users are hoping that Dynamsoft could provide a Java OCR library, I wrapped the C++ OCR library for test. Anyone can feel free to use the sample, and I’d like to receive feedbacks from you.

Java OCR Demo

Usually, Java developers would like to use JNI (Java Native Interface) to access native shared libraries. For convenience, I chose JNA (Java Native Access) to invoke native code.

Declare native library:

package com.dynamsoft;

import com.sun.jna.Native;
import com.sun.jna.Library;
import com.sun.jna.NativeLong;
import com.sun.jna.Pointer;
import com.sun.jna.ptr.PointerByReference;

public interface DynamsoftOCR extends Library {
	// put jna.jar and dynamicOCR(x64).dll in the same folder, otherwise you
	// need set jna.library.path to the path of DynamicOCR(x64).dll
	DynamsoftOCR INSTANCE = (DynamsoftOCR) Native
			.loadLibrary(
					System.getProperty("sun.arch.data.model").equals("64") ? "DynamicOCRx64.dll"
							: "DynamicOCR.dll", DynamsoftOCR.class);

	public Pointer OCRFileEx5(NativeLong imageCount, String[] imagePaths,
			Pointer /* (NativeLong *) */resultSize,
			PointerByReference/* byte[] */resultDetails,
			Pointer /* (NativeLong *) */resultDetailsSize, int toPlainTextPDF,
			String tessDataPath, String language, int pageMode,
			String unicodeFontName, int useDetectedFont,
			int minFontSizeDoMoreOCR, double thresholdRate, String license,
			int wordsType, int pdfFontSize);
}

Invoke native library for OCR:

import java.io.File;
import java.io.FileOutputStream;

import com.dynamsoft.DynamsoftOCR;
import com.sun.jna.Memory;
import com.sun.jna.Native;
import com.sun.jna.NativeLong;
import com.sun.jna.Pointer;
import com.sun.jna.ptr.PointerByReference;

public class OCRDemo {
	public static void main(String[] args)
	{
		NativeLong imageCount = new NativeLong(1);
		String[] images = new String[1];
		String currentWorkDir = System.getProperty("user.dir");
		if (!currentWorkDir.endsWith(File.separator))
			currentWorkDir += File.separator;
		images[0] = currentWorkDir + "DNTImage7.tif";
		Pointer resultSize = new Memory(Native.getNativeSize(NativeLong.class));
		PointerByReference resultDetails = null;
		Pointer resultDetailsSize = new Memory(Native.getNativeSize(NativeLong.class));
		int resultFormat = 2; //0:Pure text ASCII string.  1:PDF plain text.  2:PDF imave over text. 
		String tessDataPath = currentWorkDir;
		String language = "eng";
		int pageMode = 3;
		String unicodeFontName = "Arial";
		int useDetectedFont = 1;
		int minFontSizeDoMoreOCR = 0;
		double thresholdRate = 1.0;
		String license = "4374D06B2115F4B2F220166355214F65"; //2015-03-15 expire
		int wordsType = 0;
		int pdfFontSize = 0;//12;
		Pointer pResult = DynamsoftOCR.INSTANCE.OCRFileEx5(imageCount, images, resultSize, resultDetails, resultDetailsSize,
				resultFormat, tessDataPath, language, pageMode, unicodeFontName, useDetectedFont,
				minFontSizeDoMoreOCR, thresholdRate, license, wordsType, pdfFontSize);		
		if (pResult != null && resultSize.getNativeLong(0).intValue() > 0)//longValue() > 0) 
		{
			byte[] result = pResult.getByteArray(0, resultSize.getNativeLong(0).intValue());//intValue());

			try {
				FileOutputStream out = new FileOutputStream(currentWorkDir + "result" + File.separator + "result.pdf");
				out.write(result);
				out.close();
			} catch (Exception e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}

	}
}

Note: the license will expire on 03/15/2015.

Now, you can try to integrate the Java OCR module into your J2EE projects.

Download

JavaOCR