How to convert a Persian pdf file to Microsoft word format?

18,051

Solution 1

Very often PDF files in non-Latin scripts (especially RTL scripts such as Arabic, Hebrew and Farsi) are generated by software which sort of LTR-ifies the text at the word or sentence-fragment level, or just somehow gets the right glyphs to display but you get gibberish for the 'logical' text. In these cases there is very little to be done except write a custom back-converter which is effectively not an option.

However, if you can figure out how the file is created - which is often indicated in the meta-data accessible using common PDF readers - there might be an option to open the file in the application which generated it, or at least you could make your question more specific.

Solution 2

I had the same problem with converting pdf files to word. After copy/paste in Word, the formatting changed and caused trouble. I tried several online converters but they also failed.
The only method that worked was as follows:

  1. Open the pdf file with Adobe Acrobat Reader, then from the file menu choose print. From the printer names, choose adobe acrobat. Yes, you are about to create a pdf from a pdf!
  2. Open the new pdf file with Google Chrome (drag and drop the file onto Chrome).
  3. Now simply select all the text (ctrl + A) and copy/paste it into a blank Word file.
Share:
18,051

Related videos on Youtube

Mehdi
Author by

Mehdi

Updated on September 18, 2022

Comments

  • Mehdi
    Mehdi over 1 year

    I have a pdf file which is Persian script and it is a Right-to-Left. Since Persian uses UTF-8 format therefore I can't convert it into a plain text in Microsoft Word, also I can't copy-paste the text resulting unreadable characters. I have tried a lot of softwares such as unipdf and e-Pdf Converter however after the conversion still the characters are not displayed properly. I even tried OCR but again same problem appeared. The pdf does'nt have any password or restrictions.

    Does anyone have any other ideas?

    Edit: I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)

    • Ramhound
      Ramhound almost 9 years
      Microsoft Word supports UTF-8 format. It also supports right to left languages. So why exactly can't you convert it to a Word document?
    • Mehdi
      Mehdi almost 9 years
      Hey thanx for your consideration. The source of my file is PDF so I don't know what exactly happens when I try to copy and paste it in Microsoft Word, but it doesn't show proper character. The same thing happens when I try to convert it using third party tools.
    • user5249203
      user5249203 almost 9 years
    • Mehdi
      Mehdi almost 9 years
      @RedGrittyBrick I read your answer. but in my case I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)- Thanks
    • Austin T French
      Austin T French almost 9 years
      How was the PDF created? Electronically or scanned and you are hoping for OCR to take over?
    • user5249203
      user5249203 almost 9 years
      Can you create an example PDF and post it somewhere public so that people can download it from there using a URL?
    • Mehdi
      Mehdi almost 9 years
      @AthomSfere The PDF was created automatically by converting a MS Word file into a pdf. Thanks
    • Mehdi
      Mehdi almost 9 years
      @RedGrittyBrick Here is an example of PDF drive.google.com/…
    • user5249203
      user5249203 almost 9 years
      I can cut and paste text from that using Chrome's built-in PDF viewer - there is no obvious garbling of the characters but the direction of text is mostly reversed. I don't read Persian so can't tell whether the actual characters are all OK - but they look superficially OK. With a different PDF viewer, eVince, the main problem is selecting contiguous text. Unfortunately I don't think I can help with your problem.
  • user5730
    user5730 over 3 years
    Thank you. This work for me but I didn't open the file with Adobe Acrobat Reader, Opened in IE after that using print screen and paint capture from pdf.