Parse custom URIs with urlparse (Python)
Solution 1
I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.
I would use the results of the first parse, then synthesize an http url and parse it again:
parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)
Solution 2
You can also register a custom handler with urlparse:
import urlparse
def register_scheme(scheme):
for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
getattr(urlparse, method).append(scheme)
register_scheme('moose')
This will append your url scheme to the lists:
uses_fragment
uses_netloc
uses_params
uses_query
uses_relative
The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.
urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}
Solution 3
There is also library called furl which gives you result you want:
>>>import furl
>>>f=furl.furl("qqqq://base/id#hint");
>>>f.scheme
'qqqq'
>>> f.host
'base'
>>> f.path
Path('/id')
>>> f.path.segments
['id']
>>> f.fragment
Fragment('hint')
>>> f.fragmentstr
'hint'
Solution 4
The question appears to be out of date. Since at least Python 2.7 there are no issues.
Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
>>> import urlparse
>>> urlparse.urlparse("qqqq://base/id#hint")
ParseResult(scheme='qqqq', netloc='base', path='/id', params='', query='', fragment='hint')
Solution 5
Try removing the scheme entirely, and start with //netloc, i.e.:
>>> SCHEME="qqqq"
>>> url="qqqq://base/id#hint"[len(SCHEME)+1:]
>>> url
'//base/id#hint'
>>> urlparse.urlparse(url)
('', 'base', '/id', '', '', 'hint')
You won't have the scheme in the urlparse result, but you know the scheme anyway.
Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):
$ python2.6 -c 'import urlparse; print urlparse.urlparse("qqqq://base/id#hint")'
ParseResult(scheme='qqqq', netloc='base', path='/id#hint', params='', query='', fragment='')
Related videos on Youtube
u0b34a0f6ae
304/365 days for the first year; not so bad. Bitcoin address: 17csaFKUJnBp7NhqxurSHnYxBuyPhGNCcm
Updated on March 19, 2020Comments
-
u0b34a0f6ae about 4 years
My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.
If I do not adjust urlparse's uses_* lists I get this:
>>> urlparse.urlparse("qqqq://base/id#hint") ('qqqq', '', '//base/id#hint', '', '', '') >>> urlparse.urlparse("http://base/id#hint") ('http', 'base', '/id', '', '', 'hint')
Here is what I do, and I wonder if there is a better way to do it:
import urlparse SCHEME = "qqqq" # One would hope that there was a better way to do this urlparse.uses_netloc.append(SCHEME) urlparse.uses_fragment.append(SCHEME)
Why is there no better way to do this?
-
u0b34a0f6ae over 14 yearsI perfer my own workaround to this one; I would have to do this roundtrip all the time in my custom URL module.
-
yantrab over 14 yearsFair enough: I didn't like relying on internals of the module, but I reasonable engineers can differ!
-
Vladimir Mihailenco almost 13 yearsBut query still is not parsed properly... Thanks anyway.
-
knickum about 8 yearsWanted to back this up further, as of 04/26/2016; also parses beyond the basics shown above:
weird_scheme = 'qqq://username:[email protected]/some/path?params=key#frag_ment'
. Then parse and show username:urlparse(weird_scheme).username #'username'
or show query:urlparse(weird_scheme).query) #'params=key'