Remove any trailing blank lines or lines with whitespaces from end of file

8,233

Solution 1

This task is way more easily accomplished by processing the file's lines in the opposite order.

tac infile | awk 'flag {print} {if(NF) flag=1}' | tac | sponge infile

As pointed out in the comments by Malte Skoruppa and zwets, Ubuntu doesn't come with the moreutils package preinstalled, which contains sponge; an alternative solution is to use a command substitution inside a herestring to read the input file, so that, being the command substitution processed first, the file is safe to be truncated by the second tac command:

<<<"$(< infile)" tac | awk 'flag {print} {if(NF) flag=1}' | tac > infile
  • tac infile: ... does the opposite of cat infile (!): prints the file to stdout inverting the line's order;
  • awk [...]: processes the file;
  • tac: ... does the opposite of cat (!): prints the file to stdout inverting the line's order;
  • sponge infile: outputs to infile only when the left side of the pipe has terminated its execution, to avoid truncating infile before it's read by the first tac command;

awk command breakdown:

  • flag {print}: if flag is set, it will print the line; flag won't be set until a record whose NF value matches a number greater than 0 is processed, so until a record whose NF value matches a number greater than 0 is not found, the print command will be skipped;
  • {if(NF) flag=1}: if while flag is still unset a record whose NF value matches a number greater than 0 is processed, it will not be printed and flag will be set to 1, so the first record whose NF value matches a number greater than 0 won't be printed;

Test on a test file (mind that line 4 and line 7 contain 5 spaces, while line 5 and line 8 are empty):

user@debian ~ % cat infile                                           
line1
line2
line3


line6


user@debian ~ % tac infile | awk 'flag {print} {if(NF) flag=1}' | tac
line1
line2
line3


user@debian ~ % 

Line 7 and line 8 have been removed because they were both at the end of the file, containing only spaces (line 7) or containing nothing (line 8); line 6 was deleted because it was the first one, reading the file's lines in the opposite order, to have at least 1 field (hence not being empty or containing only spaces)

Solution 2

Your script should work if fixed like so:

while
 last_line=$(tail -1 "./file.txt")
 [[ "$last_line" =~ ^$ ]] || [[ "$last_line" =~ ^[[:space:]]+$ ]]
do
 sed -i '$d' "./file.txt"
done

Your script had two main problems: (1) you never updated $last_line, so the loop's guard would always evaluate the same thing; (2) your [[ "$last_line" =~ $ ]] test matched any line, since any line has an end. (This is the reason why your script emptied your file completely.) You probably want to match against ^$ instead, which matches only empty lines. Additionally, I simplified the sed command to delete the last line in the loop's body (simply $d does the job).

However, this script is unnecessarily complicated. sed is there for just that kind of thing! This one-liner will do the same thing as the above script:

sed -i ':a;/^[ \n]*$/{$d;N;ba}' ./file.txt

Roughly,

  1. Match current line against ^[ \n]*$. (i.e, can only contain whitespaces and newlines)
  2. If it doesn't match, just print it. Read in next line and continue with step 1.
  3. If it does match,
    • If we are at the end of the file, delete it.
    • If we are not at the end of the file, append the next line to the current line, inserting a newline character between the two, and go back to step 1 with this new, longer line.

There are lots of awesome sed tutorials on the Internet. For example, I can recommend this one. Happy learning! :-)

Update: And of course, if you additionally want to remove the last (non-blank) line of the file after having truncated the trailing blank lines, you can just use another sed -i '$d' ./file.txt after either your script or the above one-liner. I intentionally did not want to include that in the sed one-liner since I thought that removing trailing blank lines is quite a reusable piece of code that may be interesting for other people; but removing the last non-blank line is really specific to your use case, and trivial anyway once you removed the trailing blank lines.

Share:
8,233

Related videos on Youtube

jackson
Author by

jackson

Updated on September 18, 2022

Comments

  • jackson
    jackson over 1 year

    I want to get delete all of the blank lines and lines with spaces (if any exist (ONLY from the bottom of the file)) and then to remove one more line (also ONLY from the bottom of the file).

    I have this code:

    while [[ "$last_line" =~ $ ]] || [[ "$last_line" =~ ^[[:space:]]+$ ]]
    do
        sed -i -e '${/$/d}' "./file.txt"
    done
        sed -i -e '${/$/d}' "./file.txt"
    

    For some reason the loop doesn't stop and it deletes everything in the file. What is the matter?

    • fedorqui
      fedorqui over 8 years
      if $last_line doesn't get updated, it will fall into the while forever. Also, the standard way to remove empty lines is grep -v '^\s*$' file, don't know what else you want to do here but it looks a bit too complicated to involve loops.
    • kos
      kos over 8 years
      I'm not getting that $ before the curly braces. What is that for?
    • Julian Stirling
      Julian Stirling over 8 years
      I am a little confused as to what you want. You want to completely remove any line that has a space anywhere on it, and also remove the last line from the file?
    • A.B.
      A.B. over 8 years
      Please add an example, the question is very hard to understand.
  • kos
    kos over 8 years
    Hm I think you missed this two parts: "ONLY from the bottom of the file" and "and then to remove one more line (also ONLY from the bottom of the file)". I expressed my aversion towards cat file > file1 already so I won't bother you anymore with this one :)
  • Malte Skoruppa
    Malte Skoruppa over 8 years
    Oh, this is neat, +1 for using tac. :-) I intentionally did not remove the last non-blank line in my answer since I thought that removing the trailing blank lines is by far the more interesting part of the task and much more reusable (and removing one more line after having truncated the trailing blank lines is easy.) Minor quibble: sponge is not POSIX, nor even part of the GNU Coreutils that come preinstalled with Ubuntu. Having to install an extra package just to remove some blank lines may be undesirable for some people.
  • zwets
    zwets over 8 years
    +1 for the cleverness of the double tac and another for pointing out sponge. I confess to doing the occasional diff -u a b | patch -p0 with fingers crossed. Just putting sponge in that pipeline resolves the race condition perfectly and elegantly. I agree with @MalteSkoruppa that the fact that sponge is not in coreutils is a hurdle, but then it made me apt-get install moreutils right away and admire some of the other little gems in that package. Their ingenuity is what makes working with Unix and GNU such a joy.
  • A.B.
    A.B. over 8 years
    @zwets "all empty lines", that's true, but it deletes the latest empty line also … in my test.
  • A.B.
    A.B. over 8 years
    Add a line line 7 with a space and the awk script fails. OP said "delete all of the blank lines and lines with spaces" … " and then to remove one more line", but it's really hard to understand
  • kos
    kos over 8 years
    @A.B. If you look at the pattern OP used to match the lines to remove, he used $ (which is wrong since it matches every line, I think he meant to use ^$) and ^[[:space:]]+$, so he wants to delete empty lines and lines containing only spaces (^[[:space:]]+$), not every line containing at least one space (or it would have been [[:space:]])
  • kos
    kos over 8 years
    Also he used [[:space:]], which I agree it's ambiguous since it matches any type of whitespace, so possibly another OP's error, since it doesn't match what he said he wants to do
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy over 8 years
    Edited my answer. Please review
  • kos
    kos over 8 years
    @MalteSkoruppa Thanks, that's true tough; I added a version which uses a command substitution in a herestring to read the file in the first tac command, so that one can avoid the use of sponge
  • kos
    kos over 8 years
    @zwets Thanks, same as above: I added a version which uses a command substitution in a herestring to read the file in the first tac command, so that one can avoid the use of sponge, tough I agree with installing moreutils for not only for sponge but also for the other neat stuff that come with it :)
  • kos
    kos over 8 years
    Gotta go now, I'm gonna check it later along with A.B.'s and Matte Skoruppa's answer tough :)
  • kos
    kos over 8 years
    I think this does something else: "I want to get delete all of the blank lines and lines with spaces (if any exist (ONLY from the bottom of the file)) and then to remove one more line (also ONLY from the bottom of the file)." (mind the italic: first remove all the lines at the bottom of the file which are: 1. Empty or 2. Containing only spaces; afterwards (then), delete the last line regardless (which obviously would be a non empty and a non containing only spaces line)); this in my test deletes only the last line if it's empty or it contains only spaces.
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy over 8 years
    @kos welp, I'm gonna leave it at that and wait till OP at least gives example of what they want
  • kos
    kos over 8 years
    Yep, that wouldn't harm tough.