How to remove filename extension from a list of filenames in bash

62

Solution 1

You could do all in one command.

find /path/to -type f -execdir bash -c 'printf "%s\n" "${@%.*}"' bash {} +

Solution 2

I believe you would want to make an array instead of a string:

IFS=$'\n' # split on newline only
set -o noglob # disable the glob part
file_list=($(find . -name '*.*' -type f))

Or with bash 4.4+, not breaking on file paths with newline characters:

readarray -td '' file_list < <(find . -name '*.*' -type f -print0)

Then your parameter expansion should work, though here it would make more sense to use an array variable again:

trimmed_file_list=("${file_list[@]%.*}")

In your code sample, you are making a string then asking parameter expansion to remove everything after the final dot character in the full string.

Solution 3

Your code does not include any arrays. Also, it puts a list of filenames into a string, $file_list. The contents of the string will end with file3.png and your parameter substitution removes .png from the string, leaving you with a single string of filenames where the last filename does not have a filename suffix.

Putting multiple separate objects (pathnames) into a single string automatically disqualifies the script from working properly for files whose names contains spaces (or whatever delimiter the string uses). Using an array would not help as you would still split the output of find on whitespaces.


To trim the extension off of all filenames of regular files in or below the current directory:

find . -type f -name "*.*" -exec sh -c '
    for pathname do
        printf "Would move %s to %s...\n" "$pathname" "${pathname%.*}"
        # mv -i "$pathname" "${pathname%.*}"
    done' sh {} +

This looks for regular files whose names contains at least one dot. The pathnames of these files are fed in batches into a small shell script that loops over them. The shell script renames the files by removing the last dot in the filename and everything that comes after. The actual mv command is commented out for safety.

The find command acts as a generator of pathnames for the internal shell script, and we are guaranteed to properly process pathnames containing spaces, newlines and tabs. There is no need for storing the output of commands in variables.

If you have an array of pathnames, possibly created by using

shopt -s globstar nullglob
file_list=( **/*.* )  # get pathnames of files with dots in their names

Then you would be able to output the pathnames without suffixes with

printf '%s\n' "${file_list[@]%.*}"

Whether this would help you, I don't know. If you want to use the pathnames with no suffixes for something, then using the output of the above printf command would be the wrong thing to do (you would be back at not handling strange pathname again). So when and how you delete the filename suffixes depends on what you'd like to do with the result.

Related:

Share:
62

Related videos on Youtube

Rob
Author by

Rob

Updated on September 18, 2022

Comments

  • Rob
    Rob over 1 year

    I have a Pandas dataframe that I've populated with data;

    One column is the year that the measurement took place (meas_year).
    Another column is reference data that I want to compare my measurement to (ref_year).

    The issue is that we don't have reference data for every year.

    Where we don't have reference data for the measurement, I want to take reference data from a previous year and adjust it so a valid comparison can still be done. This adjustment is done by summing up the adjustment values for the missing years. e.g. if the measurement happened in 2016 but we only have 2014 reference data, we need to add on the known adjustment for 2014 and 2015.

    My question revolves how to do this checking and then how to select the right adjustment.

    I have a dictionary (converted to a series) with the correct adjustment values like so:

    adj_values={2014: 10.,
                2015: 12., 
                2016: 14.}
    

    What I then do is:

    find where the reference year is not equal to the measurement year

    find=(np.where(data['ref_year'] != data['meas_year']))[0]
    

    take the ref and meas years where they are not equal

    reference_years=indexed_data['ref_year'].iloc[find]
    measurement_years=indexed_data['meas_year'].iloc[find]
    

    loop over all the data points and find which adjustment values need to be used and then add them up

    for i in range(len(find)):
        find_adjustment=((adj_values.index >= int(reference_years[i])) & (adj_values.index < int(measurement_years[i])))
        adjustment=adj_values[find_adjustment].sum() 
    

    This all works but is there a faster solution?

    Edit:

    Here's some sample code with the expected output.

    meas_years = pd.Series([2010, 2010, 2012, 2016, 2016])
    ref_years = pd.Series([2010, 2010, 2010, 2011, 2011])
    
    adj_values_dict={2010: 5,
                     2011: 12,
                     2012: 14,
                     2013: 4,
                     2014: 2,
                     2015: 5, 
                     2016: 7}
    
    adj_values=pd.Series(adj_values_dict)
    adjustment=pd.Series(np.zeros(len(meas_years)))
    
    
    find=(np.where(ref_years != meas_years))[0]
    meas_years_subset=meas_years.iloc[find]
    ref_years_subset=ref_years.iloc[find]
    
    
    for i in range(len(find)):
        find_adjustment=((adj_values.index >= ref_years[find[i]]) & (adj_values.index < meas_years[find[i]]))    
        adjustment.iloc[find[i]]=adj_values[find_adjustment].sum()       
    

    This should give:

    In [24]: adjustment
    Out[24]: 
    0     0
    1     0
    2    17
    3    37
    4    37
    
    • cs95
      cs95 over 6 years
      Can you please shorten this to your dataframe and expected output dataframe please?
    • Kusalananda
      Kusalananda over 5 years
      Your title mentions arrays, but there is no array in your code.
    • Shrout1
      Shrout1 over 5 years
      @Kusalananda Thank you! I've come to realize that since asking. Still learning :)
  • Rob
    Rob over 6 years
    Sorry I maybe explained badly but I'm using a dataframe (as stated at the start) but then pull out the different data into series. The issue is that I have tens of thousands of data points and I thought there should be a way to vectorise that horrible loop.
  • David Leon
    David Leon over 6 years
    Ok, maybe you could try something like numba or cython to enhance pandas performances.
  • Shrout1
    Shrout1 over 5 years
    Thank you! I’ll test it as soon as I hit my desk in the morning.
  • Shrout1
    Shrout1 over 5 years
    @Wildcard Thanks for the pointer! I'm generating the files that are dropped into this directory programmatically so it shouldn't be a problem.
  • Shrout1
    Shrout1 over 5 years
    Thank you! I actually need to create a couple copies of the array, one for a menu that has the filename modified and one that will retain the original index value so that I can call the file from the menu. It is truly amazing how much bash can do though! The ability to embed everything into one command still blows me away! :D
  • Shrout1
    Shrout1 over 5 years
    Thank you! Your mastery of bash is very evident! I'm still scraping at the surface... Thank you for taking the time to help :)