Dynamically append text to filenames in Bash

5,687

These solutions you link to are in fact quite good. Some answers may lack explanation, so let's sort it out, add some more maybe.

This line of yours

for file in *.txt

indicates the extension is known beforehand (note: POSIX-compliant environments are case sensitive, *.txt won't match FOO.TXT). In such case

basename -s .txt "$file"

should return the name without the extension (basename also removes directory path: /directory/path/filenamefilename; in your case it doesn't matter because $file doesn't contain such path). To use the tool in your code, you need command substitution that looks like this in general: $(some_command). Command substitution takes the output of some_command, treats it as a string and places it where $(…) is. Your particular redirection will be

… > "./$(basename -s .txt "$file")_sorted.txt"
#      ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the output of basename will replace this

Nested quotes are OK here because Bash is smart enough to know the quotes within $(…) are paired together.

This can be improved. Note basename is a separate executable, not a shell builtin (in Bash run type basename, compare to type cd). Spawning any extra process is costly, it takes resources and time. Spawning it in a loop usually performs poorly. Therefore you should use whatever the shell offers you to avoid extra processes. In this case the solution is:

… > "./${file%.txt}_sorted.txt"

The syntax is explained below for a more general case.


In case you don't know the extension:

… > "./${file%.*}_sorted.${file##*.}"

The syntax explained:

  • ${file#*.}$file, but the shortest string matching *. is removed from the front;
  • ${file##*.}$file, but the longest string matching *. is removed from the front; use it to get just an extension;
  • ${file%.*}$file, but the shortest string matching .* is removed from the end; use it to get everything but extension;
  • ${file%%.*}$file, but with the longest string matching .* is removed from the end;

Pattern matching is glob-like, not regex. This means * is a wildcard for zero or more characters, ? is a wildcard for exactly one character (we don't need ? in your case though). When you invoke ls *.txt or for file in *.txt; you're using the same pattern matching mechanism. A pattern without wildcards is allowed. We have already used ${file%.txt} where .txt is the pattern.

Example:

$ file=name.name2.name3.ext
$ echo "${file#*.}"
name2.name3.ext
$ echo "${file##*.}"
ext
$ echo "${file%.*}"
name.name2.name3
$ echo "${file%%.*}"
name

But beware:

$ file=extensionless
$ echo "${file#*.}"
extensionless
$ echo "${file##*.}"
extensionless
$ echo "${file%.*}"
extensionless
$ echo "${file%%.*}"
extensionless

For this reason the following contraption might be useful (but it's not, explanation below):

${file#${file%.*}}

It works by identifying everything but extension (${file%.*}), then removes this from the whole string. The results are like this:

$ file=name.name2.name3.ext
$ echo "${file#${file%.*}}"
.ext
$ file=extensionless
$ echo "${file#${file%.*}}"

$   # empty output above

Note the . is included this time. You might get unexpected results if $file contained literal * or ?; but Windows (where extensions matter) doesn't allow these characters in filenames anyway, so you may not care. However […] or {…}, if present, may trigger their own pattern matching scheme and break the solution!

Your "improved" redirection would be:

… > "./${file%.*}_sorted${file#${file%.*}}"

It should support filenames with or without extension, albeit not with square or curly brackets, unfortunately. Quite a shame. To fix it you need to double quote the inner variable.

Really improved redirection:

… > "./${file%.*}_sorted${file#"${file%.*}"}"

Double quoting makes ${file%.*} not act as a pattern! Bash is smart enough to tell inner and outer quotes apart because the inner ones are embedded in the outer ${…} syntax. I think this is the right way.

Another (imperfect) solution, let's analyze it for educational reasons:

${file/./_sorted.}

It replaces the first . with _sorted.. It will work fine if you have at most one dot in $file. There is a similar syntax ${file//./_sorted.} that replaces all dots. As far as I know there's no variant to replace the last dot only.

Share:
5,687

Related videos on Youtube

Hashim Aziz
Author by

Hashim Aziz

Updated on September 18, 2022

Comments

  • Hashim Aziz
    Hashim Aziz over 1 year

    I have the following for loop to individually sort all text files inside of a folder (i.e. producing a sorted output file for each).

    for file in *.txt; 
    do
       printf 'Processing %s\n' "$file"
       LC_ALL=C sort -u "$file" > "./${file}_sorted"  
    done
    

    This is almost perfect, except that it currently outputs files in the format of:

    originalfile.txt_sorted
    

    ...whereas I would like it to output files in the format of:

    originalfile_sorted.txt 
    

    This is because the ${file} variable contains the filename including the extension. I'm running Cygwin on top of Windows. I'm not sure how this would behave in a true Linux environment, but in Windows, this shifting of the extension renders the file inaccessible by Windows Explorer.

    How can I separate the filename from the extension so that I can add the _sorted suffix in between the two, allowing me to easily differentiate the original and sorted versions of the files while still keeping Windows' file extensions intact?

    I've been looking at what might be possible solutions, but to me these seem more equipped to dealing with more complicated problems. More importantly, with my current bash knowledge, they go way over my head, so I'm holding out hope that there's a simpler solution which applies to my humble for loop, or else that someone can explain how to apply those solutions to my situation.

  • Hashim Aziz
    Hashim Aziz over 5 years
    Once again, a brilliant answer, thank you. I'm definitely a long way from understanding all of it, but for now I'm gonna leave that to one side and just read up more on command substitution when I do have the time. One question I do have: you mentioned that … > "./${file%.txt}_sorted.txt" "avoids extra processes" - is this because we're using basename in the $file variable outside of the for loop here: basename -s .txt "$file"... or have I misunderstood?
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
    @Hashim … > "./${file%.txt}_sorted.txt" is the only change you need to do to your script (ellipsis just indicates everything you have before >, it's not an actual character you should place in your script; replace > and the rest of the line with > "./${file%.txt}_sorted.txt"). It avoids extra processes because now we don't use basename at all; the whole magic is done by the shell itself thanks to ${file%.txt} syntax. Side note: sole basename -s .txt "$file" just prints something; if you think it alters the variable, you're wrong.
  • Hashim Aziz
    Hashim Aziz over 5 years
    Ah, so command substitution is being used instead of basename rather than alongside it. I see. Thanks again for your help.
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
    @Hashim Not quite. This fragment > "./$(basename -s .txt "$file")_sorted.txt" uses command substitution, the command is basename …. You either use this or > "./${file%.txt}_sorted.txt" which doesn't use command substitution. So it's (command substitution + basename) xor just fancy variable expansion ${file%.txt} without command substitution.
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
    @Hashim Or maybe I didn't understand your "instead of basename".
  • Hashim Aziz
    Hashim Aziz over 5 years
    Ah I see. Looks like I'll need to be looking up variable expansion too in that case, haha. In any case, I applied the command substitution/basename method for my for loop, and I've also noticed that there's a slight quirk in how it operates...
  • Hashim Aziz
    Hashim Aziz over 5 years
    If the original filename of the file (not including the extension in that) contains square brackets with even a single (ordinary) character in them, like [i], the filename of the output turns into originalfile_sortedoriginalfile.txt - in other words, it appends the original filename to the new filename again when it shouldn't. Square brackets containing at least 1 character are the only cause of this; parentheses, braces and single or empty square brackets don't cause this problem.
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
  • Hashim Aziz
    Hashim Aziz over 5 years
    Thanks for the extra efforts after I brought the square bracket problem to your attention. To clarify now: file1="./${file%.*}_sorted.${file##*.}" supports only extensions with one period, so the final solution as a whole shouldn't be used on files that have more than one period in them? I don't have a need to do so, I just want to make sure this is clear as it doesn't seem to be so in the current answer and I want to ensure I don't misuse this loop in the future.
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
    @Hashim More than one dot is not a problem with this. Zero dots is.
  • Hashim Aziz
    Hashim Aziz over 5 years
    Ah I understand now, there's so many snippets of code in the answer at this point it's gotten confusing. This is the final piece of code I'm left with: pastebin.com/6XvWdcKB. It works perfectly for my current data, but if you don't mind I'd appreciate you looking over it just to catch out anything I might be missing. In particular, I thought it might need an if statement to contain everything and break in the event that [ -f "$file" ] failed.
  • Kamil Maciorowski
    Kamil Maciorowski over 5 years
    @Hashim I have improved my answer, added relatively simple yet robust solution. Find it where the bold text is (or use the revision history).
  • Hashim Aziz
    Hashim Aziz over 3 years
    To clarify, the logic in the final block of code is simply doing the same as what the "really improved redirection" is doing, so I only use one or the other?
  • Kamil Maciorowski
    Kamil Maciorowski over 3 years
    @Prometheus One or the other, true. I added the "really improved solution" and the last part of the answer became somewhat misleading, inferior and unnecessary; I missed this. Let me just remove it.
  • Hashim Aziz
    Hashim Aziz over 3 years
    One more question, what purpose does the ./ at the beginning solve? I understand in the shell it usually refers to the current directory but I don't understand the use of it in parameter expansion, although I may have missed where it was explained.
  • Kamil Maciorowski
    Kamil Maciorowski over 3 years
    @Prometheus Sometimes it's good to add ./ because of this issue. The best way is to use for file in ./*.txt instead of for file in *.txt, so the variable itself contains ./ and you don't need to add it later. Still a leading dash (if any) shouldn't break a redirection, so ./ is not really necessary in this case. You used … > "./${file}_sorted" in the question and I wanted my code to be similar.