urllib2 error no host given
Solution 1
In the while loop, you're setting results to something which is not a url:
results = 'myurl+str(mystring)'
It should probably be results = myurl+str(mystring)
By the way, it appears there's no need for all the casting to string (str()
) you do:
(expanded on request)
-
print str(foo)
: in such a case, str() is never necessary. Python will always printfoo's
string representation -
results = 'http://www.myurl.com/'+str(mystring)
. This is also unnecessary;mystring
is already a string, so'http://www.myurl.com/' + mystring
would suffice. -
print "Pausing script for " + str(i) + " Seconds"
. Here you would get an error withoutstr()
since you can't do string + int. However,print "foo", 1, "bar"
does work. As doprint "foo %i bar" % 1
andprint "foo {0} bar".format(1)
(see here)
Solution 2
I found the answer. It's as follows....
The values for mystring were read in from a file. In the script I wrote to write the file I opens it with "w" instead of "wb".
Each line in the file ended with a newline character "/n".
When mystring was added to the string request the new line was being created in the middle of the request string.[1]
This would never have been apparent from my code because I changed it to post here in an effort to hide the real url I am using to get my results.[2]
My actual url looks more like this.....
Myurl.com/mystring/otherstuff/page_counter/morestuff.htm
The /n being read from the file spliced my url and gave urllib problems......
[1] I use windows. It adds lots of unseen things to text files. If I'd opened the file to write to with "wb" instead of "w" the contents would have been written without the unseen /n
[2] always post your full code kids. The good people of stackoverflow can't help you unless they can see what you are doing.....
Many thanks all, I hope this helps someone out at some point.
Paul.
Paul Tricklebank
Updated on June 04, 2022Comments
-
Paul Tricklebank almost 2 years
EDIT:(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks.
I have I have the following code:
results = 'http://www.myurl.com/'+str(mystring) print str(results) request = urllib2.Request(results) request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)') opener = urllib2.build_opener() text = opener.open(request).read()
Which is in a loop. after the loop has run a few times str(mystring) changes to give a different set of results. I can loop the script as many times as I like keeping the value of str(mystring) constant but every time I change the value of str(mystring) I get an error saying no host given when the code tries to build the opener.
opener = urllib2.build_opener()
Can anyone help please?
TIA,
Paul.
EDIT:
More code here.....
import sys import string import httplib import urllib2 import re import random import time def StripTags(text): finished = 0 while not finished: finished = 1 start = text.find("<") if start >= 0: stop = text[start:].find(">") if stop >= 0: text = text[:start] + text[start+stop+1:] finished = 0 return text mystring="test" d={} with open("myfile","r") as f: while True: page_counter=0 print str(mystring) try: while page_counter <20: results = 'http://www.myurl.com/'+str(mystring) print str(results) request = urllib2.Request(results) request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)') opener = urllib2.build_opener() text = opener.open(request).read() finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text))) for find in finds: d[find]=1 uniq_emails=d.keys() page_counter = page_counter +1 print "found this " +str(finds)" random.seed() n = random.random() i = n * 5 print "Pausing script for " + str(i) + " Seconds" + "" time.sleep(i) mystring=next(f) except IOError: print "No result found!"+""
-
Paul Tricklebank over 11 yearsmyurl is edited. its not the actual url i am using..... I don't really want to give away the real url.. That part of the code works fine.
-
Junuxx over 11 years@Paul: Still, you have the
+str(..)
stuff inside of the quotes. That won't do a string concatenation. -
Paul Tricklebank over 11 yearsyeah, I'm aware of that. The real code doesn't. I'll edit it now... I'm not sure what you mean by all the casting to string. I'd be grateful if you could show me a better way. TIA.
-
Paul Tricklebank over 11 yearsI've found the problem and it's mu own fault. The file which I'm getting the values for mystring from has \n at the end of every line when i'm reading it in. This is splitting my request string at that point.
-
Markus Unterwaditzer over 11 years@PaulTricklebank Can you post this as an answer?
-
Paul Tricklebank about 11 years@Markus many thanks. I'm not great with python (as you can probably tell) and to be honest what I'm trying to achieve is above my station. I appreciate you pointing me in the right direction. Posting my answer now..
-
Iratzar Carrasson Bores almost 6 yearsI have the same problem but the diference is that in my string parameter I use
\n
. How can I use this character in this case?