Regex URL Path from URL
Solution 1
This expression gets everything after videoplay
, aka the url path.
/\/(videoplay.+)/
This expression gets everything after the port. Also consisting of the path.
/\:\d./(.+)/
However If using Node.js
I recommend the native url
module.
var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)
Which does all of the regex work for you.
{
protocol: 'http:',
slashes: true,
auth: null,
host: 'video.google.co.uk:80',
port: '80',
hostname: 'video.google.co.uk',
hash: '#hello',
search: '?docid=-7246927612831078230&hl=en',
query: 'docid=-7246927612831078230&hl=en',
pathname: '/videoplay',
path: '/videoplay?docid=-7246927612831078230&hl=en',
href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello'
}
Solution 2
In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:
var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";
parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port; // => "3000"
parser.pathname; // => "/pathname/"
parser.search; // => "?search=test"
parser.hash; // => "#hash"
parser.host; // => "example.com:3000"
Thank you John Long, you made by day!
Solution 3
(http[s]?:\/\/)?([^\/\s]+\/)(.*)
group 3
Demo: http://regex101.com/r/vK4rV7/1
Solution 4
You can try this:
^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$
([^?]+) above is the capturing group which returns your path.
Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.
If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.
Solution 5
function getPath(url, defaults){
var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
var urlParts = url.match(reUrlPath) || [url, defaults];
return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );
Related videos on Youtube
ThomasReggi
Updated on July 31, 2022Comments
-
ThomasReggi over 1 year
I am having a little bit of regex trouble.
I am trying to get the path in this url
videoplay
.http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello
If I use this regex
/.+
it matches/video
as well.I would need some kind of anti / negative match to not include
//
-
jwrush over 11 yearsWhen I have to use regexes on urls fast and dirty, I usually include // at the beginning, before the capture group. Note you can't do http://, because they might be accessing it using a different protocol, or even ://, because they might specify the port number.
-
Raniz almost 9 yearspossible duplicate of Getting parts of a URL (Regex)
-
-
justderb almost 10 yearsThis doesn't match the path of a URL, just the very last part of the path. With "google.com/foo/bar" it matches "bar"
-
nbeuchat almost 6 yearsIt wouldn't work if there for a path such as
www.abc.com?param=xyz
. I slightly modified it like this to make it work (I also use non-matching group for the first two groups).(?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*)
Demo: regex101.com/r/eNUBb9 -
FiftiN about 5 yearsTry this url: video.google.co.uk:80?docid=-7246927612831078230&hl=en#hell…, this regex returns group1 = o
-
darksinge almost 3 yearsThe
url
node module is in legacy mode. The docs recommend using theURL
class instead. See here: nodejs.org/dist/latest-v14.x/docs/api/… -
suchislife over 2 yearsThis is beautiful.
-
Max Barrass over 2 yearsRegex URL Path from URL?
-
Alin about 2 yearsHe is asking about Regex not existing functions