Getting the bounding box of the recognized words using python-tesseract
Solution 1
Use pytesseract.image_to_data()
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
Among the data returned by pytesseract.image_to_data()
:
-
left
is the distance from the upper-left corner of the bounding box, to the left border of the image. -
top
is the distance from the upper-left corner of the bounding box, to the top border of the image. -
width
andheight
are the width and height of the bounding box. -
conf
is the model's confidence for the prediction for the word within that bounding box. Ifconf
is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.
The bounding boxes returned by pytesseract.image_to_boxes()
enclose letters so I believe pytesseract.image_to_data()
is what you're looking for.
Solution 2
tesseract.GetBoxText()
method returns the exact position of each character in an array.
Besides, there is a command line option tesseract test.jpg result hocr
that will generate a result.html
file with each recognized word's coordinates in it. But I'm not sure whether it can be called through python script.
Solution 3
Python tesseract can do this without writing to file, using the image_to_boxes
function:
import cv2
import pytesseract
filename = 'image.png'
# read the image and get the dimensions
img = cv2.imread(filename)
h, w, _ = img.shape # assumes color image
# run tesseract, returning the bounding boxes
boxes = pytesseract.image_to_boxes(img) # also include any config options you use
# draw the bounding boxes on the image
for b in boxes.splitlines():
b = b.split(' ')
img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
# show annotated image and wait for keypress
cv2.imshow(filename, img)
cv2.waitKey(0)
Solution 4
Using the below code you can get the bounding box corresponding to each character.
import csv
import cv2
from pytesseract import pytesseract as pt
pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr")
# To read the coordinates
boxes = []
with open('output.box', 'rb') as f:
reader = csv.reader(f, delimiter = ' ')
for row in reader:
if(len(row)==6):
boxes.append(row)
# Draw the bounding box
img = cv2.imread('bw.png')
h, w, _ = img.shape
for b in boxes:
img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2)
cv2.imshow('output',img)
Solution 5
Would comment under lennon310 but don't have enough reputation to comment...
To run his command line command tesseract test.jpg result hocr
in a python script:
from subprocess import check_call
tesseractParams = ['tesseract', 'test.jpg', 'result', 'hocr']
check_call(tesseractParams)
Related videos on Youtube
Abtin Rasoulian
Updated on July 09, 2022Comments
-
Abtin Rasoulian almost 2 years
I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.
I am using the following code for getting the words:
import tesseract api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz") api.SetPageSegMode(tesseract.PSM_AUTO) mImgFile = "test.jpg" mBuffer=open(mImgFile,"rb").read() result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) print "result(ProcessPagesBuffer)=",result
This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well
-
iMath over 5 years
-
-
Henry over 7 yearsI get result.hocr file with the command, though the file is HTML format.
-
Stepan Yakovenko over 5 yearsdoesn't work, boxes is unknown parameter in lastest pytesseract
-
Parikshit Chalke over 5 yearsThis is actually the correct answer for this question. But might be ignored by people due to complexity of this method
-
Atinesh about 5 yearsWhy
y-coordinate
is subtracted from the height of the image -
jtbr about 5 yearsI believe the pytesseract and opencv have different notions of the origin of the image (top left or bottom left), or at least that's what I I seemed to experience when I wrote the answer. If it works without the h there, great.
-
Eswar RDS about 4 yearsDo you know the meaning of other columns(level, page_num, block_num, par_num, line_num, word_num) in the output generated by image_to_data?
-
Bùi Nhật Duy almost 4 yearsThis work only for tesseract >= 3.05. I need a solution for lower version.