Get domain from URL in Oracle SQL

sql regex oracle

10,011

Solution 1

Thanks to the hints in the answers I finally got it working!

The code I am using now looks like this:

REGEXP_REPLACE(website_url, '(http[s]?://)?(www\.)?(.*?)((/|:)(.)*|$)', '\3')

Thanks for the help everybody!

Solution 2

Use this :

WITH tab AS 
 (SELECT 'https://www.example.co.uk/dir/index.html' AS website_url 
    FROM dual)
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '\w+(\.\w+)+') 
  FROM tab;

output:

|REGEXP_SUBSTR(REGEXP_REPLACE(W|
--------------------------------
|example.co.uk                 |

Solution 3

Not sure whether oracle supports the ?: to exclude a group or not.

REGEXP_REPLACE(website_url, '^(?:(?:http[s]?://)?www\.)?(.*?)(?:/.*|$)', '\1')

If it doesn't, then this one:

REGEXP_REPLACE(website_url, '^((http[s]?://)?www\.)?(.*?)(/.*|$)', '\3')

10,011

Foaly

Sometimes I do stuff with computers.

Updated on June 04, 2022

Comments

Foaly almost 2 years
I have a database that contains website URL's. From those URL's I'd like to extract the domain name. Here are two (quiet different) examples:
```
http://www.example.com       -> example.com
example.co.uk/dir/index.html -> example.co.uk
```
In order to do this I am using a regular expression and the functions REGEXP_SUBSTR and REGEXP_REPLACE that Oracle provides. I am using replace to replace the preceding http[s] and the www. with an empty string (deleting it). Then I use substring to get the string between the beginning and the first / or if there is no / the whole string. My code looks like this:
```
REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '(.+?)(/|$)')
```
Everything works as expected, except the fact that my regex fails to exclude the /:
```
example.com/dir/index.html -> example.com/
```
I would like to get rid of the /. How do I do that?
Foaly over 10 years

This works very nice! Thank you very much. But sadly it doesn't work for URL's that include a - for example the URl www.top.i-am-a-example.com gives top.i I tried but I can't fix it. Do you know how?
San over 10 years

Adding permissible range could be one solution to this. REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '[a-z,A-Z,0-9,-]+(\.\w+)+')
Foaly over 10 years

Yes adding a range seems to be the only option. Using the your code I still get top.i. I am not an expert on regex, so I don't know why... Looks correct to me
Foaly over 10 years

As far as I can see it Oracle does not support ?: the second works as expected, but somehow it does not work for urls like: www.example.com/dir/index.html it returns: example.comdir/index.html