How to normalize a URL?

12,585

Solution 1

Use node's URL API, alongside some manual checks.

  1. Manually check that the URL has a valid protocol.
  2. Instantiate the URL.
  3. Check that the URL does not contain additional information.

Example code:

const { URL } = require('url')
let myTestUrl = 'https://user:[email protected]:8080/p/a/t/h?query=string#hash';

try {
  if (!myTestUrl.startsWith('https://') && !myTestUrl.startsWith('http://')) {
    // The following line is based on the assumption that the URL will resolve using https.
    // Ideally, after all checks pass, the URL should be pinged to verify the correct protocol.
    // Better yet, it should need to be provided by the user - there are nice UX techniques to address this.
    myTestUrl = `https://${myTestUrl}`
  }

  const normalizedUrl = new URL(myTestUrl);

  if (normalizedUrl.username !== '' || normalized.password !== '') {
    throw new Error('Username and password not allowed.')
  }

  // Do your thing
} catch (e) {
  console.error('Invalid url provided', e)
}

I have only used http and https in this example, for a gist.

Straight from the docs, a nice visualisation of the API:

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            href                                             │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │        host         │           path            │ hash  │
│          │  │                     ├──────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │   hostname   │ port │ pathname │     search     │       │
│          │  │                     │              │      │          ├─┬──────────────┤       │
│          │  │                     │              │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.host.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │   hostname   │ port │          │                │       │
│          │  │          │          ├──────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │        host         │          │                │       │
├──────────┴──┼──────────┴──────────┼─────────────────────┤          │                │       │
│   origin    │                     │       origin        │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│                                            href                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

Solution 2

You want the normalize-url package:

const normalizeUrl = require('normalize-url');

normalizeUrl('example.com/');
//=> 'http://example.com'

It runs a bunch of normalizations on the URL.

Share:
12,585

Related videos on Youtube

Victor
Author by

Victor

Thrilled to learn new things. Love programming, coffee and mountain bikes.

Updated on September 15, 2022

Comments

  • Victor
    Victor over 1 year

    I am dealing with a situation where I need users to enter various URLs (for example: for their profiles). However, users do not always insert URLs in the https://example.com format. They might insert something like:

    • example.com
    • example.com/
    • example.com/somepage
    • but something like [email protected] or something else should not be acceptable

    How can I normalize the URLs to a format that can potentially lead to a web address? I see this behavior in web browsers. We almost always enter crappy things in a web browser's bar and they can distinguish whether that's a search or something that can be turned into a URL.

    I tried looking in many places but seems like I can't find any approach to this.

    I would prefer a solution written for Node if it's possible. Thank you very much!

  • Victor
    Victor almost 6 years
    Looks like I was actually overthinking this when it was actually almost nothing more than a RegEx check. Thank you!