Save HTML of some website in a txt file with python

40,884

Solution 1

Easiest way would be to use urlretrieve:

import urllib

urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

For Python 3.x the code is as follows:

import urllib.request    
urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt")

Solution 2

I use Python 3.
pip install requests - after install requests library you can save a webpage in txt file.

import requests

url = "https://stackoverflow.com/questions/24297257/save-html-of-some-website-in-a-txt-file-with-python"

r = requests.get(url)
with open('file.txt', 'w') as file:
    file.write(r.text)
Share:
40,884
thecatbehindthemask
Author by

thecatbehindthemask

Updated on September 26, 2020

Comments

  • thecatbehindthemask
    thecatbehindthemask over 3 years

    I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this:

    import urllib.request
    
    def get_html(url):
        f=open('htmlcode.txt','w')
        page=urllib.request.urlopen(url)
        pagetext=page.read() ## Save the html and later save in the file
        f.write(pagetext)
        f.close()
    

    But this doesn't work.

  • thecatbehindthemask
    thecatbehindthemask almost 10 years
    Thanks! I have done the next way, and working: import urllib2 def Obtener_Html(url): file("my_file.txt", "w").write(urllib2.urlopen(url).read()) if name == 'main': url=raw_input("Say me a website: ") Obtener_Html("http://"+url)
  • Leonid
    Leonid almost 3 years
    Might want to also check status_code to make sure that you are not running into http 404 or some server error. It should be http 200, ok=true