Opening pdf urls with pyPdf
23,129
Solution 1
I think urllib2 will get you what you want.
from urllib2 import Request, urlopen
from pyPdf import PdfFileWriter, PdfFileReader
from StringIO import StringIO
url = "http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf"
writer = PdfFileWriter()
remoteFile = urlopen(Request(url)).read()
memoryFile = StringIO(remoteFile)
pdfFile = PdfFileReader(memoryFile)
for pageNum in xrange(pdfFile.getNumPages()):
currentPage = pdfFile.getPage(pageNum)
#currentPage.mergePage(watermark.getPage(0))
writer.addPage(currentPage)
outputStream = open("output.pdf","wb")
writer.write(outputStream)
outputStream.close()
Solution 2
Well, you can first download the pdf separately and then use pypdf to read it
import urllib
url = 'http://example.com/a.pdf'
webFile = urllib.urlopen(url)
pdfFile = open(url.split('/')[-1], 'w')
pdfFile.write(webFile.read())
webFile.close()
pdfFile.close()
base = os.path.splitext(pdfFile)[0]
os.rename(pdfFile, base + ".pdf")
input1 = PdfFileReader(file(pdfFile, "rb"))
Solution 3
For python 3.8
import io
from urllib.request import Request, urlopen
from PyPDF2 import PdfFileReader
class GetPdfFromUrlMixin:
def get_pdf_from_url(self, url):
"""
:param url: url to get pdf file
:return: PdfFileReader object
"""
remote_file = urlopen(Request(url)).read()
memory_file = io.BytesIO(remote_file)
pdf_file = PdfFileReader(memory_file)
return pdf_file
Author by
meadhikari
Nothing to write here to be proud about till date..... but seriously working hard.
Updated on August 06, 2021Comments
-
meadhikari almost 3 years
How would I open a pdf from url instead of from the disk
Something like
input1 = PdfFileReader(file("http://example.com/a.pdf", "rb"))
I want to open several files from web and download a merge of all the files.
-
meadhikari about 12 yearsHey, what is thisFile from the line base = os.path.splitext(thisFile)[0]
-
meadhikari about 12 yearsI get AttributeError: 'str' object has no attribute 'seek'
-
Switch about 12 yearsOh sorry it was a mistake, it should be pdfFile (the absolute path for the downloaded file)
-
John about 12 years@meadhikari, sorry about that, it's fixed now.
-
meadhikari about 12 yearsWhen i try to write the file with outputStream = file("output.pdf","wb") I keep getting "AttributeError: addinfourl instance has no call method " any help would be much appreciated
-
John about 12 years@meadhikari Your code is good, my fault again.
outputStream = file("output.pdf","wb")
needs to beoutputStream = open("output.pdf","wb")
-
Shriganesh Kolhe about 4 yearsuse urllib.request instead of urllib2 for python 3.5 and higher
-
Shriganesh Kolhe about 4 yearsfor "StringIO" use >> from io import StringIO ## for Python 3