Copying Chinese Characters from PDF
From what I can see in your sample text the PDF file does not contain the Unicode representation of the Chinese text. Rather, it probably uses a special embedded font to represent the Chinese characters and normal fonts to represent Chinese accents Pinyin:
,ô t¯ıng wˇo shu¯o
should be
聽我說 tīng wǒ shuō
I hope you see that the diacritics (tī
ng wǒ
shuō
) are just built from several characters placed on top of each other, whereas the ideograms are not shown at all. Instead of U+01D2
for ǒ
you get U+02C7
for ˇ
and o
.
You can easily check whether you can copy and paste Chinese signs stored as Unicode by downloading and playing around with this file. This is a correct Unicode PDF and I have no problems copying and pasting the characters between Unicode-aware applications on my Windows 7 Professional system (German MUI).
If this doesn't work than you have to install the support for Chinese fonts on your system.
If it works then you can probably find a workaround by inspecting your PDF file and looking for the embedded fonts. Try to find these fonts on the Internet and install them on your system -- then you'll see more than just some meaningless signs. However this will only apply to the original Chinese signs; the Pinyin cannot be rescued.
Related videos on Youtube
![Gaff](https://i.stack.imgur.com/jg7xd.jpg?s=256&g=1)
Gaff
Updated on September 18, 2022Comments
-
Gaff almost 2 years
I am on a Windows 7 laptop, which I believe comes preinstalled with all the language packs. I can see Chinese characters with no issues and normally can copy them fine. (From browser to Microsoft Office works great). I have many PDFs with Chinese characters in them - whenever I try to copy and paste them into another program (such as a browser, Microsoft Office, etc) I just get strange foreign characters such as:
,ô t¯ıng wˇo shu¯o listen to me
This is copying a line. It is in this order: Chinese characters, Pinyin (Chinese in the Latin alphabet), and then the English translation.
The Pinyin is also getting messed up, as the tonal marks (accents) about each letter are moving to a space of their own.
Any ideas how to fix this?
-
wizlog over 12 yearsWhat are the permissions on the PDF? Can you edit text for example?
-
skibulk almost 12 yearsI had a Chinese PDF from which I could not copy and paste the characters using Adobe Reader. I found success however by opening the PDF in Adobe Acrobat Professional, rather than in Reader.
-