Copying Chinese Characters from PDF

8,067

From what I can see in your sample text the PDF file does not contain the Unicode representation of the Chinese text. Rather, it probably uses a special embedded font to represent the Chinese characters and normal fonts to represent Chinese accents Pinyin:

,ô t¯ıng wˇo shu¯o

should be

聽我說 tīng wǒ shuō

I hope you see that the diacritics (tīng wǒ shuō) are just built from several characters placed on top of each other, whereas the ideograms are not shown at all. Instead of U+01D2 for ǒ you get U+02C7 for ˇ and o.

You can easily check whether you can copy and paste Chinese signs stored as Unicode by downloading and playing around with this file. This is a correct Unicode PDF and I have no problems copying and pasting the characters between Unicode-aware applications on my Windows 7 Professional system (German MUI).

If this doesn't work than you have to install the support for Chinese fonts on your system.

If it works then you can probably find a workaround by inspecting your PDF file and looking for the embedded fonts. Try to find these fonts on the Internet and install them on your system -- then you'll see more than just some meaningless signs. However this will only apply to the original Chinese signs; the Pinyin cannot be rescued.

Share:
8,067

Related videos on Youtube

Gaff
Author by

Gaff

Updated on September 18, 2022

Comments

  • Gaff
    Gaff almost 2 years

    I am on a Windows 7 laptop, which I believe comes preinstalled with all the language packs. I can see Chinese characters with no issues and normally can copy them fine. (From browser to Microsoft Office works great). I have many PDFs with Chinese characters in them - whenever I try to copy and paste them into another program (such as a browser, Microsoft Office, etc) I just get strange foreign characters such as:

    ,ô t¯ıng wˇo shu¯o listen to me

    This is copying a line. It is in this order: Chinese characters, Pinyin (Chinese in the Latin alphabet), and then the English translation.

    The Pinyin is also getting messed up, as the tonal marks (accents) about each letter are moving to a space of their own.

    Any ideas how to fix this?

    • wizlog
      wizlog over 12 years
      What are the permissions on the PDF? Can you edit text for example?
    • skibulk
      skibulk almost 12 years
      I had a Chinese PDF from which I could not copy and paste the characters using Adobe Reader. I found success however by opening the PDF in Adobe Acrobat Professional, rather than in Reader.