Regex URL Path from URL

119,965

Solution 1

This expression gets everything after videoplay, aka the url path.

/\/(videoplay.+)/

This expression gets everything after the port. Also consisting of the path.

/\:\d./(.+)/

However If using Node.js I recommend the native url module.

var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)

Which does all of the regex work for you.

{
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'video.google.co.uk:80',
  port: '80',
  hostname: 'video.google.co.uk',
  hash: '#hello',
  search: '?docid=-7246927612831078230&hl=en',
  query: 'docid=-7246927612831078230&hl=en',
  pathname: '/videoplay',
  path: '/videoplay?docid=-7246927612831078230&hl=en',
  href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello' 
}

Solution 2

In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

Thank you John Long, you made by day!

Solution 3

(http[s]?:\/\/)?([^\/\s]+\/)(.*) group 3
Demo: http://regex101.com/r/vK4rV7/1

Solution 4

You can try this:

^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$

([^?]+) above is the capturing group which returns your path.

Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.

If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.

Solution 5

function getPath(url, defaults){
    var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
    var urlParts = url.match(reUrlPath) || [url, defaults];
    return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );
Share:
119,965

Related videos on Youtube

ThomasReggi
Author by

ThomasReggi

Updated on July 31, 2022

Comments

  • ThomasReggi
    ThomasReggi over 1 year

    I am having a little bit of regex trouble.

    I am trying to get the path in this url videoplay.

    http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello
    

    If I use this regex /.+ it matches /video as well.

    I would need some kind of anti / negative match to not include //

    • jwrush
      jwrush over 11 years
      When I have to use regexes on urls fast and dirty, I usually include // at the beginning, before the capture group. Note you can't do http://, because they might be accessing it using a different protocol, or even ://, because they might specify the port number.
    • Raniz
      Raniz almost 9 years
      possible duplicate of Getting parts of a URL (Regex)
  • justderb
    justderb almost 10 years
    This doesn't match the path of a URL, just the very last part of the path. With "google.com/foo/bar" it matches "bar"
  • nbeuchat
    nbeuchat almost 6 years
    It wouldn't work if there for a path such as www.abc.com?param=xyz. I slightly modified it like this to make it work (I also use non-matching group for the first two groups). (?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*) Demo: regex101.com/r/eNUBb9
  • FiftiN
    FiftiN about 5 years
    Try this url: video.google.co.uk:80?docid=-7246927612831078230&hl=en#hell…‌​, this regex returns group1 = o
  • darksinge
    darksinge almost 3 years
    The url node module is in legacy mode. The docs recommend using the URL class instead. See here: nodejs.org/dist/latest-v14.x/docs/api/…
  • suchislife
    suchislife over 2 years
    This is beautiful.
  • Max Barrass
    Max Barrass over 2 years
    Regex URL Path from URL?
  • Alin
    Alin about 2 years
    He is asking about Regex not existing functions