urllib2 error no host given

14,142

Solution 1

In the while loop, you're setting results to something which is not a url:

results = 'myurl+str(mystring)'

It should probably be results = myurl+str(mystring)

By the way, it appears there's no need for all the casting to string (str()) you do: (expanded on request)

  • print str(foo): in such a case, str() is never necessary. Python will always print foo's string representation
  • results = 'http://www.myurl.com/'+str(mystring). This is also unnecessary; mystring is already a string, so 'http://www.myurl.com/' + mystring would suffice.
  • print "Pausing script for " + str(i) + " Seconds". Here you would get an error without str() since you can't do string + int. However, print "foo", 1, "bar" does work. As do print "foo %i bar" % 1 and print "foo {0} bar".format(1) (see here)

Solution 2

I found the answer. It's as follows....

The values for mystring were read in from a file. In the script I wrote to write the file I opens it with "w" instead of "wb".

Each line in the file ended with a newline character "/n".

When mystring was added to the string request the new line was being created in the middle of the request string.[1]

This would never have been apparent from my code because I changed it to post here in an effort to hide the real url I am using to get my results.[2]

My actual url looks more like this.....

Myurl.com/mystring/otherstuff/page_counter/morestuff.htm

The /n being read from the file spliced my url and gave urllib problems......

[1] I use windows. It adds lots of unseen things to text files. If I'd opened the file to write to with "wb" instead of "w" the contents would have been written without the unseen /n

[2] always post your full code kids. The good people of stackoverflow can't help you unless they can see what you are doing.....

Many thanks all, I hope this helps someone out at some point.

Paul.

Share:
14,142
Paul Tricklebank
Author by

Paul Tricklebank

Updated on June 04, 2022

Comments

  • Paul Tricklebank
    Paul Tricklebank almost 2 years

    EDIT:(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks.

    I have I have the following code:

    results = 'http://www.myurl.com/'+str(mystring)
    print str(results)
    request = urllib2.Request(results)
    request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
    opener = urllib2.build_opener()
    text = opener.open(request).read()
    

    Which is in a loop. after the loop has run a few times str(mystring) changes to give a different set of results. I can loop the script as many times as I like keeping the value of str(mystring) constant but every time I change the value of str(mystring) I get an error saying no host given when the code tries to build the opener.

    opener = urllib2.build_opener()
    

    Can anyone help please?

    TIA,

    Paul.

    EDIT:

    More code here.....

    import sys
    import string
    import httplib
    import urllib2
    import re
    import random
    import time
    
    
    def StripTags(text):
        finished = 0
        while not finished:
            finished = 1
            start = text.find("<")
            if start >= 0:
                stop = text[start:].find(">")
                if stop >= 0:
                    text = text[:start] + text[start+stop+1:]
                    finished = 0
        return text
    mystring="test"
    
    d={}
    
        with open("myfile","r") as f:
            while True:
                page_counter=0
                print str(mystring)
    
                try:
                    while page_counter <20:
                        results = 'http://www.myurl.com/'+str(mystring)
                        print str(results)
                        request = urllib2.Request(results)
                        request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
                        opener = urllib2.build_opener()
                        text = opener.open(request).read()
                        finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text)))
                        for find in finds:
                            d[find]=1
                            uniq_emails=d.keys()
                        page_counter = page_counter +1
                        print "found this " +str(finds)"
                        random.seed()
                        n = random.random()
                        i = n * 5
                        print "Pausing script for " + str(i) + " Seconds" + ""
                        time.sleep(i)
                    mystring=next(f)
                except IOError:
                    print "No result found!"+""
    
  • Paul Tricklebank
    Paul Tricklebank over 11 years
    myurl is edited. its not the actual url i am using..... I don't really want to give away the real url.. That part of the code works fine.
  • Junuxx
    Junuxx over 11 years
    @Paul: Still, you have the +str(..) stuff inside of the quotes. That won't do a string concatenation.
  • Paul Tricklebank
    Paul Tricklebank over 11 years
    yeah, I'm aware of that. The real code doesn't. I'll edit it now... I'm not sure what you mean by all the casting to string. I'd be grateful if you could show me a better way. TIA.
  • Paul Tricklebank
    Paul Tricklebank over 11 years
    I've found the problem and it's mu own fault. The file which I'm getting the values for mystring from has \n at the end of every line when i'm reading it in. This is splitting my request string at that point.
  • Markus Unterwaditzer
    Markus Unterwaditzer over 11 years
    @PaulTricklebank Can you post this as an answer?
  • Paul Tricklebank
    Paul Tricklebank about 11 years
    @Markus many thanks. I'm not great with python (as you can probably tell) and to be honest what I'm trying to achieve is above my station. I appreciate you pointing me in the right direction. Posting my answer now..
  • Iratzar Carrasson Bores
    Iratzar Carrasson Bores almost 6 years
    I have the same problem but the diference is that in my string parameter I use \n. How can I use this character in this case?