Python regex - extracting directories from path

18,442

Solution 1

Although not necessary, but re is handy choice for this problem.

import re
pattern = re.compile(r"/(?P<artist>[a-zA-Z0-9 ]+?)/(?P<release>[a-zA-Z0-9 ]+?)/(?P<tracknumber>\d+?) - (?P<title>[a-zA-Z0-9 ]+?).flac")
s = "/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac"
m = pattern.search(s)
print m.group('artist')
print m.group('release')
print m.group('track number')
print m.group('title')

I use expressions such as [a-zA-Z0-9 ] to explicitly specify the chars I expect in the string. It is just my preference to have a white-list-like regex to make the code more secure. There are many other ways to compose equivalent patterns. You will find all you need here http://docs.python.org/library/re.html, you don't need a book for that.

Solution 2

pattern1 = re.compile(r'/([^/]*)/([^/]*)/([0-9]*) - (.*)\.[^.]*')
artist,release,Tracknumber,Title = pattern1.match(file1).groups()

pattern2 = re.compile(r'/\[([^]]*)\] ([^/]*)/([0-9]*) - (.*) - (.*)\.[^.]*')
catno,release,Tracknumber,artist,Title = pattern2.match(file2).groups()

(where file1 and file2 are the paths you gave above).

First thing: you capture something matched by a regex with parentheses. So everything between parentheses below will be spit back out as an item in the match.

Second: you match anything except a forward slash with regex code like [^/]. So to match lots of things between forward slashes, you do [^/]*.

Putting those together, to capture the artist in your first sttring, you do /([^/]*)/. Then you do that again to get the release.

Third: to match any digit, you use [0-9]. So, to match any string of digits, you use [0-9]*.

Apply those principles repeatedly, and you should be able to understand the above.

Share:
18,442
ohrstrom
Author by

ohrstrom

Updated on June 28, 2022

Comments

  • ohrstrom
    ohrstrom almost 2 years

    I have a question about regex/Python. Sorry if this topic has been discussed millions of times - usually I find the answers on so/google etc. but I'm stuck in the millions of answers with this one.. (To be honest - I own a regex book, but somehow I'm too stupid to really understand it...)

    For a music-management-system I need to extract information out of paths, providing different sets of options. Here two examples:

    If the path is: (Case 1)

    "/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac"
    
    it should extract:
    • artist: "The Prodigy"
    • release: "The Fat Of The Land"
    • Tracknumber: 4
    • Title: "Funky Stuff"

    And for eg: (Case 2)

    "/[XLR 483] The Fat Of The Land/04 - The Prodigy - The  Funky Stuff.flac"
    
    should extract:
    • catno: "XLR 483"
    • release: "The Fat Of The Land"
    • Tracknumber: 4
    • artist: "The Prodigy"
    • Title: "Funky Stuff"

    There is no need for a regex that covers both cases, these are just two examples. I'll then provide them as options (or starting-point to add own ones).

    Any help would be greatly appreciated!

    @ S.Lott: I don't have a regex for this, I started with splitting the string:

    parts = rel_path.split('/')       
    track = parts[-1]
    release = parts[-2]
    artist = parts[-3]
    

    but this looks like an extremely inflexible and un-elegant solution to me.

    edit:

    So far I have something like:

    pattern = re.compile('^/(?P<artist>[a-zA-Z0-9 ]+)/(?P<release>[a-zA-Z0-9 ]+)/(?P<track>[a-zA-Z0-9 -_]+).[a-zA-Z]*.*')
    
    
    rel_path = '/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac'
    
    match = pattern.search(rel_path)
    
    artist = match.group('artist')
    release = match.group('release')
    track = match.group('track')
    
  • ohrstrom
    ohrstrom about 12 years
    Ok. Thx - splitting works, but makes it difficult to edit in an admin-interface, as the logic sits completely in the code. Would have been my first approach as well :) , but was looking for a more modular solution.
  • ohrstrom
    ohrstrom about 12 years
    Thanks, yes this helps. Did not know that you directly can assign groups() like this!
  • ohrstrom
    ohrstrom about 12 years
    Yes - I've been so far before.. But how to have it modular enough to assign it to variables - so that reversion of directories is possible..
  • ohrstrom
    ohrstrom about 12 years
    ? How does this exactly work? Don't understand this but looks like magic!
  • Mike
    Mike about 12 years
    Yep. groups() just returns a tuple, and this is a general way of assigning individual names to different parts of a tuple.