How to Build Flet Chat App with Barcode and Gemini APIs

Gemini is Google’s latest AI model, which can be used for free with a limit of 60 queries per minute, and is capable of recognizing text from images. Generally, 1D barcodes are accompanied by human-readable text, which can be used to verify the accuracy of barcode recognition results. In this article, we will use the Flet Python API to build a desktop chat app integrated with both barcode and Gemini APIs. The app will read barcodes from images using Dynamsoft Barcode Reader and perform OCR on text within images using Gemini’s text recognition capabilities.

Installation

pip install -U google-generativeai dbr flet 

Prerequisites

Flet Python API for Desktop Applications

Flet empowers developers to create desktop applications using Python. It offers a crash course for constructing a real-time chat application, which serves as an excellent starting point.

Our application features a list view for displaying chat messages, a text input field, a button for uploading images, a button for sending messages, and a button to clear the chat history.

Flet chat app UI

  • Chat messages:

        
      chat = ft.ListView(
              expand=True,
              spacing=10,
              auto_scroll=True,
          )
    
  • Text input field:

      new_message = ft.TextField(
          hint_text="Write a message...",
          autofocus=True,
          shift_enter=True,
          min_lines=1,
          max_lines=5,
          filled=True,
          expand=True,
          on_submit=send_message_click,
      )
    
  • Button to load an image:

        
      def pick_files_result(e: ft.FilePickerResultEvent):
          global image_path
          image_path = None
          if e.files != None:
              image_path = e.files[0].path
              # TODO
    
      def pick_file(e):
          pick_files_dialog.pick_files()
    
      pick_files_dialog = ft.FilePicker(on_result=pick_files_result)
      page.overlay.append(pick_files_dialog)
    
      ft.IconButton(
          icon=ft.icons.UPLOAD_FILE,
          tooltip="Pick an image",
          on_click=pick_file,
      )
    
  • Button to send a message:

      def on_message(message: Message):
          if message.message_type == "chat_message":
              m = ChatMessage(message)
    
              chat.controls.append(m)
              page.update()
    
      page.pubsub.subscribe(on_message)
    
      def send_message_click(e):
          global image_path
          if new_message.value != "":
              page.pubsub.send_all(
                  Message("Me", new_message.value, message_type="chat_message"))
    
              question = new_message.value
    
              new_message.value = ""
              new_message.focus()
              page.update()
    
              page.pubsub.send_all(
                  Message("Gemini", "Thinking...", message_type="chat_message"))
    
              # TODO
    
      ft.IconButton(
          icon=ft.icons.SEND_ROUNDED,
          tooltip="Send message",
          on_click=send_message_click,
      ),
    

    PubSub facilitates asynchronous communication across page sessions. The subscribe method enables the receipt of broadcast messages from other sessions, while the send_all method allows for sending messages to all active sessions. Whenever a new message is received, the list view is automatically updated to display this new message.

  • Button to clear the chat history:

      def clear_message(e):
          global image_path
          image_path = None
          chat.controls.clear()
          page.update()
    
      ft.IconButton(
          icon=ft.icons.CLEAR_ALL,
          tooltip="Clear all messages",
          on_click=clear_message,
      )
    

Integrating the Dynamsoft Barcode Reader

The Dynamsoft Barcode Reader is an efficient library designed for barcode scanning. To enable barcode scanning in your app, you must integrate this library. Here’s how you can do it:

  1. Import the Dynamsoft Barcode Reader library and initialize a barcode reader instance using your license key.

     from dbr import *
     license_key = "LICENSE-KEY"
     BarcodeReader.init_license(license_key)
     reader = BarcodeReader()
    
  2. Decode the barcode from the uploaded image and send the result to the chat.

     def pick_files_result(e: ft.FilePickerResultEvent):
         global image_path, barcode_text
         barcode_text = None
         image_path = None
         if e.files != None:
             image_path = e.files[0].path
             page.pubsub.send_all(
                 Message("Me", image_path, message_type="chat_message", is_image=True))
    
             text_results = None
             try:
                 text_results = reader.decode_file(image_path)
             except BarcodeReaderError as bre:
                 print(bre)
    
             if text_results != None:
                 barcode_text = text_results[0].barcode_text
                 page.pubsub.send_all(
                     Message("DBR", barcode_text, message_type="chat_message"))
    

Utilizing Google’s Gemini AI for Text Recognition

Gemini can extract text from images. Once you’ve decoded a barcode, you can employ Gemini to verify the accuracy of the text decoded from the barcode. Here are the steps to use Gemini:

  1. Set up the API key for Gemini.

     import google.generativeai as genai
     import google.ai.generativelanguage as glm
    
     genai.configure(api_key='API-KEY')
    
  2. Initialize the text and vision models. The vision model takes both text and images as input.

     model_text = genai.GenerativeModel('gemini-pro')
     chat_text = model_text.start_chat(history=[])
     model_vision = genai.GenerativeModel('gemini-pro-vision')
     chat_vision = model_vision.start_chat(history=[])
    
  3. Customize the command to effectively recognize text from the barcode image.

     def send_message_click(e):
         global image_path
         if new_message.value != "":
             ...
    
             if question == ":verify":
                 question = "recognize text around the barcode"
                 response = model_vision.generate_content(
                     glm.Content(
                         parts=[
                             glm.Part(
                                 text=question),
                             glm.Part(
                                 inline_data=glm.Blob(
                                     mime_type='image/jpeg',
                                     data=pathlib.Path(
                                         image_path).read_bytes()
                                 )
                             ),
                         ],
                     ))
    
                 text = response.text
                 page.pubsub.send_all(
                     Message("Gemini", text, message_type="chat_message"))
    
    

Verifying the Barcode Decoding Results with the Accompanying Text

Now, we can check whether the text read from the barcode exists in the text recognized from the image. Since the text extracted by Gemini might include spaces, it’s essential to eliminate these spaces prior to comparison.

if barcode_text == None:
    return

text = text.replace(" ", "")
if text.find(barcode_text) != -1:
    page.pubsub.send_all(
        Message("Gemini", barcode_text + " is correct ✓", message_type="chat_message"))
else:
    page.pubsub.send_all(
        Message("Gemini", barcode_text + " may not be correct", message_type="chat_message"))

Launch the desktop application and test it with some images that contain 1D barcodes:

flet run chatbot.py

Flet chat app with barcode and gemini APIs

Source Code

https://github.com/yushulx/flet-chat-app-gemini-barcode