Javascript/Regex for finding just the root domain name without sub domains
Solution 1
You can't do this with a regular expression because you don't know how many blocks are in the suffix.
For example google.com has a suffix of com. To get from subdomain.google.com to google.com you'd have to take the last two blocks - one for the suffix and one for google.
If you apply this logic to subdomain.google.co.uk though you would end up with co.uk.
You will actually need to look up the suffix from a list like http://publicsuffix.org/
Solution 2
Don't use regex, use the .split() method and work from there.
var s = domain.split('.');
If your use case is fairly narrow you could then check the TLDs as needed, and then return the last 2 or 3 segments as appropriate:
return s.slice(-2).join('.');
It'll make your eyes bleed less than any regex solution.
Solution 3
I've not done a lot of testing on this, but if I understand what you're asking for, this should be a decent starting point...
([A-Za-z0-9-]+\.([A-Za-z]{3,}|[A-Za-z]{2}\.[A-Za-z]{2}|[A-za-z]{2}))\b
EDIT:
To clarify, it's looking for:
one or more alpha-numeric characters or dashes, followed by a literal dot
and then one of three things...
- three or more alpha characters (i.e. com/net/mil/coop, etc.)
- two alpha characters, followed by a literal dot, followed by two more alphas (i.e. co.uk)
- two alpha characters (i.e. us/uk/to, etc)
and at the end of that, a word boundary (\b) meaning the end of the string, a space, or a non-word character (in regex word characters are typically alpha-numerics, and underscore).
As I say, I didn't do much testing, but it seemed a reasonable jumping off point. You'd likely need to try it and tune it some, and even then, it's unlikely that you'll get 100% for all test cases. There are considerations like Unicode domain names and all sorts of technically-valid-but-you'll-likely-not-encounter-in-the-wild things that'll trip up a simple regex like this, but this'll probably get you 90%+ of the way there.
Solution 4
If you have limited subset of data, I suggest to keep the regex simple, e.g.
(([a-z\-]+)(?:\.com|\.fr|\.co.uk))
This will match:
www.google.com --> google.com
www.google.co.uk --> google.co.uk
www.foo-bar.com --> foo-bar.com
In my case, I know that all relevant URLs will be matched using this regex.
Collect a sample dataset and test it against your regex. While prototyping, you can do that using a tool such https://regex101.com/r/aG9uT0/1. In development, automate it using a test script.
Related videos on Youtube
jamesmhaley
James specialises in full-stack development with JavaScript, Typescript, React, Node, GraphQL and MongoDB. He has extensive experience with TDD, Kubernetes, Google Kubernetes Engine, Google Cloud Platform, Istio, DevOps and is fluent in Serverless technologies. He is a extremely personable team player but also has the ability to work as a team lead or sole-developer. James has vast experienced working under agile/scrum methodologies and is also at his best when given the opportunity to set the culture of a team.
Updated on April 28, 2022Comments
-
jamesmhaley almost 2 years
I had a search and found lot's of similar regex examples, but not quite what I need.
I want to be able to pass in the following urls and return the results:
www.google.com returns google.com
sub.domains.are.cool.google.com returns google.com
doesntmatterhowlongasubdomainis.idont.wantit.google.com returns google.com
sub.domain.google.com/no/thanks returns google.com
Hope that makes sense :) Thanks in advance!-James
-
Pekka over 13 yearsWhat is the result going to be for
sub.domain.google.co.uk
? -
Gumbo over 13 yearsThose are not URLs but just domain names (except the last that is just a string that can be interpreted as domain name plus a URL path).
-
janmoesen over 13 yearsBe sure to check out the Public Suffix List at publicsuffix.org.
-
jamesmhaley over 13 yearsCould you explain what it does please, my understanding of regex is minimal. And how it would be implemented.
-
hallvors over 13 years90% is generous. Basically, there IS no simple way to do this. The domain name system is way too convoluted and allows a lot of variation.
-
theraccoonbear over 13 yearsGiven that the examples provided are "normalish" looking domains, I think you can probably hit a substantial chunk, but sure, maybe not 90%. As I said though (and really to the point) it's unlikely you'll get 100% for all of your test cases.