Get domain from URL in Oracle SQL
Solution 1
Thanks to the hints in the answers I finally got it working!
The code I am using now looks like this:
REGEXP_REPLACE(website_url, '(http[s]?://)?(www\.)?(.*?)((/|:)(.)*|$)', '\3')
Thanks for the help everybody!
Solution 2
Use this :
WITH tab AS
(SELECT 'https://www.example.co.uk/dir/index.html' AS website_url
FROM dual)
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '\w+(\.\w+)+')
FROM tab;
output:
|REGEXP_SUBSTR(REGEXP_REPLACE(W|
--------------------------------
|example.co.uk |
Solution 3
Not sure whether oracle supports the ?:
to exclude a group or not.
REGEXP_REPLACE(website_url, '^(?:(?:http[s]?://)?www\.)?(.*?)(?:/.*|$)', '\1')
If it doesn't, then this one:
REGEXP_REPLACE(website_url, '^((http[s]?://)?www\.)?(.*?)(/.*|$)', '\3')
Related videos on Youtube
Comments
-
Foaly almost 2 years
I have a database that contains website URL's. From those URL's I'd like to extract the domain name. Here are two (quiet different) examples:
http://www.example.com -> example.com example.co.uk/dir/index.html -> example.co.uk
In order to do this I am using a regular expression and the functions REGEXP_SUBSTR and REGEXP_REPLACE that Oracle provides. I am using replace to replace the preceding
http[s]
and thewww.
with an empty string (deleting it). Then I use substring to get the string between the beginning and the first/
or if there is no/
the whole string. My code looks like this:REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '(.+?)(/|$)')
Everything works as expected, except the fact that my regex fails to exclude the
/
:example.com/dir/index.html -> example.com/
I would like to get rid of the
/
. How do I do that? -
Foaly over 10 yearsThis works very nice! Thank you very much. But sadly it doesn't work for URL's that include a
-
for example the URlwww.top.i-am-a-example.com
givestop.i
I tried but I can't fix it. Do you know how? -
San over 10 yearsAdding permissible range could be one solution to this.
REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '[a-z,A-Z,0-9,-]+(\.\w+)+')
-
Foaly over 10 yearsYes adding a range seems to be the only option. Using the your code I still get
top.i
. I am not an expert on regex, so I don't know why... Looks correct to me -
Foaly over 10 yearsAs far as I can see it Oracle does not support
?:
the second works as expected, but somehow it does not work for urls like:www.example.com/dir/index.html
it returns:example.comdir/index.html