I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

19,033

Solution 1

Replacing whitespace with:

url = url.replace(" ", "%20")

if the problem is with the whitespace.

Solution 2

Spaces are not allowed in URL, I removed them and it seems to be working now:

import urllib.request
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
url = start_url.replace(" ","")
source = urllib.request.urlopen(url).read()

Solution 3

Solr search strings can get pretty weird. Better use the 'quote' method to encode characters before making the request. See example below:

from urllib.parse import quote

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
    
source = urllib.request.urlopen(quote(start_url)).read()

Better later than never...

Solution 4

You probably already found out by now but let's get it written here.

There can't be any space character in the URL, and there are 2, after bundle_fq e dm_field_deadlineTo_fq

Remove those and you're good to go

Share:
19,033

Related videos on Youtube

Talib Daryabi
Author by

Talib Daryabi

I am a new Android developer and have a good knowledge of java. I am much interested in the newest technologies and trends. I am obtaining my B-tech degree from the Lovely professional University in CSE. During my studies, I have been acquiring good knowledge in various programming languages including C, C++, Python, Java, JavaScript, and HTML, and lately, I have been concentrating on software development on the Android platform. I am eager to constantly learn new things. At the moment, my focus is on java 11 OCJA and OCJP exam and Android

Updated on June 04, 2022

Comments

  • Talib Daryabi
    Talib Daryabi almost 2 years

    I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters.

    how can I solve the issue?

    start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
        
    source = urllib.request.urlopen(start_url).read()
    

    the error I get is :

    URL can't contain control characters. '/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq=' (found at least ' ')
    
    • Александр
      Александр over 3 years
      Is it a valid URL? It doesn't work from a browser and has a strange part /10//0/ . Normally, double-slash // can be replaced with a single slash, then the URL gives page not found.
    • Talib Daryabi
      Talib Daryabi over 3 years
      you are right, I provided the wrong URL, i Fixed it. thank you. small mistakes are always big headaches.
    • Doz Parp
      Doz Parp over 3 years
      please make sure you encode you urls: urlencoder.io/python actually in url, " " should be "%20" or "+"
    • Klaus D.
      Klaus D. over 3 years
      Looks like a unprintable character slipped into the URL somehow.