Getting the final redirected URL

14,141

This might be a bit overkill for what you want, but it is an alternative to using regular expressions. This answer uses the Selenium web automator Python APIs to follow the redirects. It will also open up the pdf file in a browser window. The code below requires that you are using Firefox, but you can also use other browsers by replacing the name with the one you want to use i.e. webdriver.Chrome(), webdriver.Ie().

To install selenium: pip install selenium

The code:

from selenium import webdriver

driver = webdriver.Firefox()
link = 'http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'

driver.get(link)
print(driver.current_url)

It is also possible to run the browser in the background so no window pops up. The added benefit to this solution is that if they change the way the re-direction works you will not need to update the regular expressions in your code.

Share:
14,141
user3691767
Author by

user3691767

Updated on June 04, 2022

Comments

  • user3691767
    user3691767 almost 2 years

    My code is as follows:

    url_orig ='http://www.has-sante.fr/portail/jcms/c_676945/fr/prialt-ct-5245'
    u = urllib.request.urlopen(url_orig)
    print (u.geturl())
    

    Basically when the URL gets redirected twice. The output should be:

    http://www.has-sante.fr/portail/upload/docs/application/pdf/2008-07/ct-5245_prialt_.pdf
    

    But the output that I'm getting is the first redirect:

    http://www.has-sante.fr/portail/plugins/ModuleXitiKLEE/types/FileDocument/doXiti.jsp?id=c_676945
    

    How do I get the required final URL? Any help would be appreciated!

  • user3691767
    user3691767 almost 10 years
    Yes I had to use this method instead of urllib. Thanks a lot!