Tag: tesseract

  • tesseract Unable to load unicharset file /usr/share/tesseract/tessdata/eng.unicharset

    Cause: the package for English language is missing.

    On Centos:

    sudo yum install tesseract-en

    On Ubuntu:

    sudo apt-get install tesseract-ocr-eng

  • tesseract check_legal_image_size:Error:Only 1,2,4,5,6,8 bpp are supported:16

    Error:

    tesseract test.tif output.txt
    Tesseract Open Source OCR Engine
    check_legal_image_size:Error:Only 1,2,4,5,6,8 bpp are supported:16

    Cause: tif image bits/sample value not in 1,2,4,5,6,8 or tif image is transparent with alpha value (Extra Samples: 1<unassoc-alpha>):

    tiffinfo  test.tif
      TIFF Directory at offset 0x7b6 (1974)
      Image Width: 145 Image Length: 39
      Resolution: 72, 72 (unitless)
      Bits/Sample: 8
      Compression Scheme: AdobeDeflate
      Photometric Interpretation: min-is-black
      Extra Samples: 1<unassoc-alpha>
      FillOrder: msb-to-lsb
      Orientation: row 0 top, col 0 lhs
      Samples/Pixel: 2
      Rows/Strip: 39
      Planar Configuration: single image plane
      DocumentName: a.tif
      Software: ImageMagick 6.2.8 05/07/12 Q16
      Predictor: horizontal differencing 2 (0x2)

    Solution: convert tif to acceptable bit sample and remove alpha value:

    convert test.tif -background white -alpha remove -flatten -alpha off test.tif

    or convert to jpg and reconvert to tif again should remove alpha:

    convert test.tif test.jpg
    convert test.jpg test.tif

     

     

  • tesseract name_to_image_type:Error:Unrecognized image type

    Error:

    tesseract test.png output.txt
    Tesseract Open Source OCR Engine
    name_to_image_type:Error:Unrecognized image type:test.png
    IMAGE::read_header:Error:Can’t read this image type:test.png
    tesseract:Error:Read of file failed:test.png

    Cause: You are using tesseract version <=2.0.4

    Solution:

    1. convert png to tif:

    convert test.png -background white -alpha remove -flatten -alpha off test.tif

    2. Upgrade teeseract to latest version