How can I copy this quote from PDF?

2,375

Solution 1

Most reliable way of doing it is by using OCR.

But as a dirty and fast solution you can use Google Quick View from the search result for your link, in Quick view use option View > Plain HTML.

It still contains some garbled text and is quite unreadable but a large amount of text is correct and copy-able. Search works here so you can use it to locate the target text and copy it without any garbled text.


Detailed Example here:
Google search results for URL includes Quick View link.
Then use View option Plain HTML.
The Quick View has an options to view the document as HTML.
On Google's HTML version, you can search and select the equivalent text like this:
Search the HTML verion to find and select the relevant quote.
Pasting into Notepad produces this output:

A difference between sta-tisticians and computer scientists in this field seems tobe that when a statistician has an idea he or she writesa paper; a computer scientist starts a company.

Not exactly as displayed, but close enough that you can work with it.

Solution 2

You'll have to discard the corrupted text that's already associated with the PDF before you can re-OCR it. The easiest way to do that is to save it in TIFF format, then open it with Acrobat and re-OCR it. When I did that, it worked for me.

Solution 3

Looks like a PDF with incorrect encoding. See the following threads:

Try printing the PDF using CutePDF, then see if the resulting PDF is any better.

Share:
2,375
kaifi
Author by

kaifi

Updated on September 18, 2022

Comments

  • kaifi
    kaifi over 1 year

    I'm fairly new to python. I know docstrings are used for documentation and I can use help() to call them. My question is, when I write my own function in a .py file with documentation, for example:

    file - foo.py

    def foo():
        """
        foo documentation
        """
        some code here
    

    How can I print out the "foo documentation" from the terminal or interactive session? Thanks.

  • Iain Samuel McLean Elder
    Iain Samuel McLean Elder over 11 years
    I can't verify this solution because I don't have Adobe Acrobat.
  • Iain Samuel McLean Elder
    Iain Samuel McLean Elder over 11 years
    I forgot that Google cached PDFs that way! Very useful. I edited your answser to include a step-by-step guide with pictures to make this answer more generally useful.
  • Nicole Hamilton
    Nicole Hamilton over 11 years
    I've re-OCRed it for you with Acrobat X and uploaded it to a free site at pdfcast.org/pdf/… The OCR isn't (by any means!) perfect, but hopefully will be usable.
  • geepee
    geepee about 10 years
    In the Google Search version that Google serves to my machine, Instead of having this "Quick View" link, I have to click the down arrow, then choose "Cached". But that works too.