regex matching urls that contain string in relative path not in domain

15,088

Solution 1

Try this regex:

/\b(?:https?:\/\/)?[^\/:]+\/.*?job/gmi

Online Demo: http://regex101.com/r/rV3oP8

Solution 2

Here is one that I came up with:

^(?:.*://)?(?:[wW]{3}\.)?([^:/])*/.*job.*

It matches all of your examples, but not the ones with job.com or jobs.com. (jobs is only in the path.)

I tested this in sublime text which is nice b/c the regex result is highlighted as you type.

Share:
15,088
mitchelllc
Author by

mitchelllc

Code for fun!

Updated on August 21, 2022

Comments

  • mitchelllc
    mitchelllc over 1 year

    This is one of my interview questions. I didn't come up with a good enough solution and got rejected.

    The question was

    What is the one regex to match all urls that contain job(case insensitive) in the relative   
    path(not domain) in the following list:
    
        - http://www.glassdoor.com/job/ABC
        - https://glassdoor.com/job/
        - HTTPs://job.com/test
        - Www.glassdoor.com/foo/bar/joBs
        - http://192.168.1.1/ABC/job
        - http://bankers.jobs/ABC/job
    

    My solution was using lookahead and lookbehind, /(?<!\.)job(?!\.)/i. This works fine in above lists. However, if the url is HTTPs://jobs.com/test, it will not work.

    I am wondering what is the correct answer for this question. Thanks in advance for any suggestions!