How to deskew a scanned text page with ImageMagick?

11,023

Solution 1

I would try a bigger value like 80% otherwise an Imagemagick forum member has a bash script that may be better: http://www.fmwconcepts.com/imagemagick/textdeskew/index.php

Solution 2

with OCRmyPDF

You can also straighten the pages after first having ImageMagick convert your JPG to PDF (convert input.jpg input.pdf) and then letting OCRmyPDF rectify the PDF:

ocrmypdf --deskew --tesseract-timeout=0 input.pdf output.pdf

Using your example page, I'd say the resulting text is straight:

straightened page, after running OCRmyPDF

As documented here, --tesseract-timeout=0 disables optical character recognition.

Of course you can also deskew the PDF and make it searchable in one go:

ocrmypdf --deskew -l fra input.pdf output.pdf

Make sure to have the French language pack from Tesseract installed before running this. Here are instructions.

Crop the PDF

To get rid of the black parts on the sides and the white part on the bottom of the PDF, you can use pdfcrop (commonly part of TeX Live):

# Remove margins at left, top, right, and bottom
pdfcrop --margins '-60 0 -50 -430' output.pdf cropped_output.pdf

The cropped and deskewed PDF:

PDF cropped with pdfcrop

Share:
11,023

Related videos on Youtube

carbontracking
Author by

carbontracking

Still messing with stuff, 26 years into my professional life

Updated on July 17, 2022

Comments

  • carbontracking
    carbontracking almost 2 years

    I have scanned documents that weren't scanned perfectly straight so the text is not orientated perfectly horizontally, i.e. perhaps 10° of a slope on each line.

    My understanding is that the deskew option in ImageMagick should solve this, for example

    convert skewed_1500.jpeg -deskew 40% skewed_1500_not.jpg
    

    but it doesn't have any noticeable effect on the output file.

    I've attached the skewed and deskewed images for comparison.

    First the original image: skewed image

    Then the purportedly deskewed image: deskewed image

  • carbontracking
    carbontracking over 7 years
    Excellent, your 80% suggestion did the job perfectly. I also tried the script that you linked to and the bare script, without playing with parameters, did deskew somewhat but nt as perfectly as your 80% suggestion. Many thanks, this one has gone into the toolbox.
  • polemon
    polemon over 5 years
    What exactly does the percentage mean? I can understand an angle, but a percentage makes no sense to me. Also, no matter how high I set the value, convert doesn't do anything for me.