How to Use Tesseract OCR as an Assist for Barcode Scan

When scanning barcodes, the recognition rate is affected by image quality. If a barcode image is severely damaged, the barcode algorithm may fail to work. Fortunately, most of the linear barcodes (1D barcode) are printed with corresponding texts. OCR (optical character recognition) algorithm could be a complement to the barcode algorithm in such a scenario. In this article, I will share how to use Tesseract OCR to boost the barcode scan.

Getting Started with Tesseract OCR in Windows

Install the pre-built binary package of Tesseract for Windows.

Here is the image for the test.

codabar

Add the path C:\Program Files\Tesseract-OCR to system environment, and then run the command via cmd.exe:

tesseract codabar.jpg out
tesseract ocr

The result contains English and digital characters. The expected result should be digits only. We can optimize the command to output digital characters as follows:

tesseract codabar.jpg out digits
tesseract ocr digits

The result looks better.

Reading Barcode and Recognizing Accompanying Text in Python

OCR is ready, what about barcode detection? We can use Python to quickly create a simple program.

Install Dynamsoft Barcode Reader and PyTesseract:

pip install dbr pytesseract

Get a free trial license, with which we can read barcodes using a few lines of code:

from dbr import DynamsoftBarcodeReader
dbr = DynamsoftBarcodeReader()
dbr.initLicense('LICENSE-KEY')
    try:
        results = dbr.DecodeFile(image)
        textResults = results["TextResults"]
        resultsLength = len(textResults)
        print("count: " + str(resultsLength))
        if resultsLength != 0:
            for textResult in textResults:
                print('Barcode Type: %s' % (textResult["BarcodeFormatString"]))
                print('Barcode Result: %s' % (textResult["BarcodeText"]))
        else :
            print("No barcode detected")
    except Exception as err:
        print(err)

Recognize text using pytesseract:

import pytesseract

custom_oem_psm_config = r'digits'

result = pytesseract.image_to_string(Image.open(image), config=custom_oem_psm_config)
print('OCR Result:     %s' % (result))

pytesseract

The results of barcode recognition and OCR are the same. It looks perfect.

Now, do some changes to the image and save it as a damaged.png file:

damaged codabar

Rerun the app:

python ocr damaged barcode

In this scenario, the barcode SDK failed to work, but OCR can work well. It shows the value of OCR as the assist for scanning barcodes.

In my testing case, the OCR result is 100% correct. However, most of the time, OCR cannot output perfect results due to image quality. It cannot replace the barcode algorithm for 1D barcode scanning.

Source Code

https://gist.github.com/yushulx/32566858fc799b7d2e59899f0712c735