Split a filename python on underscore

11,965

Solution 1

The following should work for you

s="Planning_Group_20180108.ind"
'_'.join(s.split('_')[:-1])

This way you create a list which is the string split at the _. With the [:-1] you remove the last part. '_'.join() combines your list elements in the resulting list.

Solution 2

print("Planning_Group_20180108.ind".rsplit("_", 1)[0])
print("Soldto_20180108".rsplit("_", 1)[0])

rsplit allow you to split X times from the end when "_" is detected. In your case, it will split it in an array of two string ["Planning_Group", "20180108.ind"] and you just need to take the first element [0] (http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html)

Solution 3

First I would extract filename itself. I'd split it from the extension. You can go easy way by doing:

path = "Planning_Group_20180108.ind"
filename, ext = path.split(".")

It is assuming that path is actually only a filename and extension. If I'd want to stay safe and platform independent, I'd use os module for that:

fullpath = "this/could/be/a/full/path/Planning_Group_20180108.ind"
path, filename = os.path.split(fullpath)

And then extract "root" and extension:

root, ext = os.path.splitext(filename)

That should leave me with Planning_Group_20180108 as root. To discard "_20180108" we need to split string by "_" delimiter, going from the right end, and do it only once. I would use .rsplit() method of string, which lets me specify delimiter, and number of times I want to make splits.

what_i_want, the_rest = root.rsplit("_", 1)

what_i_want should contain left side of Planning_Group_20180108 cut in place of first "_" counting from right side, so it should be Planning_Group

The more compact way of writing the same, but not that easy to read, would be:

what_i_want = os.path.splitext(os.path.split("/my/path/to/Planning_Group_20180108.ind")[1])[0].rsplit("_", 1)

PS. You may skip the part with extracting root and extension if you're sure, that extension will not contain underscore. If you're unsure of that, this step will be necessary. Also you need to think of case with multiple extensions, like /path/to/file/which_has_a.lot.of.periods.and_extentions. In that case would you like to get which_has_a.lot.of.periods.and, or which_has? Think of it while planning your app. If you need latter, you may want to extract root by doing filename.split(".", 1) instead of using os.path.splitext()

reference:

os.path.split(path),

os.path.splitext(path)

str.rsplit(sep=None, maxsplit=-1)

Solution 4

You can use re:

import re
s = ["Planning_Group_20180108.ind", 'Soldto_20180108']
new_s = list(map(lambda x:re.findall('[a-zA-Z_]+(?=_\d)', x)[0], s))

Output:

['Planning_Group', 'Soldto']

Solution 5

Using regex here is pretty pythonic.

import re
newname = re.sub(r'_[0-9]+', '', 'Planning_Group_20180108.ind"')

Results in:

'Planning_Group.ind'

And the same regex produces 'SoldTo' from 'Soldto_20180108'.

Share:
11,965
user7422128
Author by

user7422128

I started the journey by building data marts, data lakes, data warehouses, data pipelines in traditional databases like RDBMS and big data ecosystem. Have worked on Building analytic platform, dashboards and graphing libraries. Skills in brief: -Proficient in python,Data Structure and Algorithm design -Have 4 years experience in Datawarehouses, datamarts and datalakes design and developement -Experience building datalake using AWS, services like S3, SNS, SQS, Lambda, datapipelines,EMR, EC2, python -Working experience on bigdata ecosystem like spark, hive, hadoop, pig,nifi -Developed platforms,dashboards using Angular 6, D3 charting, Django, Elastic search,python,gunicon, MVC -Worked on developing RESTful services with python frameworks

Updated on June 05, 2022

Comments

  • user7422128
    user7422128 almost 2 years

    I have a filename as "Planning_Group_20180108.ind". i only want Planning_Group out of it. File name can also be like Soldto_20180108, that case the output should be Soldto only.

    A solution without using reg ex is more preferable as it is easier to read for a person who haven't used regex yet

    • jan-seins
      jan-seins over 6 years
      have a look at the split funciton "Planning_Group_20180108.ind".split('_')
    • user7422128
      user7422128 over 6 years
      I have tried split but not able to get to the complete solution. Don't know why everybody is down voting it
    • Tim Pietzcker
      Tim Pietzcker over 6 years
      Your question is unclear. You don't seem to want to split on any underscore, only the last one, is that correct?
    • user7422128
      user7422128 over 6 years
      Yes i want to split only on the last underscore
    • Tim Pietzcker
      Tim Pietzcker over 6 years
      Will there always be at least one underscore? If not, what should be the result?
    • Matthias
      Matthias over 6 years
      filename.rsplit('_', 1)[0]