How to remove all white spaces just between brackets [] using Unix tools?

5,834

Solution 1

If the [, ] are balanced and not nested, you could use GNU awk as in:

gawk -v RS='[][]' '
   NR % 2 == 0 {gsub(/\s/,"")}
   {printf "%s", $0 RT}'

That is use [ and ] as the record separators instead of the newline character and remove blanks on every other records only.

With sed, with the additional requirement that there be no newline character inside [...]:

sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'

If they are balanced but may be nested as in blah [blih [1] bluh] asd, then you could use perl's recursion regexp operators like:

perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'

Another approach, which would scale to very large files would be to use the (?{...}) perl regexp operator to keep track of the bracket depth like in:

perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
  {"$1".($l>0?"":$2)}gse'

Actually, you can also process the input one character at a time like:

perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'

That approach can be implemented with POSIX tools:

od -A n -vt u1 |
  tr -cs 0-9 '[\n*]' |
  awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
       !NF{next}; l>0 && $0 in b {next}
       $0 == "91" {l++}; $0 == "93" {l--}
       {printf "%c", $0}'

With sed (assuming no newline inside the [...]):

sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
    -e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
    -e 's/_c/]/g;s/_o/[/g;s/_u/_/g'

Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF...) spacing character in the ASCII charset. Depending on your locale, others might get included.

Solution 2

Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)

perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'

That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:

s{
    (\[ .*? \])         # search for [ ... ] block, capture (as $1)
}{
    $1 =~ y/ //dr       # delete spaces. you could add in other whitespace here, too
                        # d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex
Share:
5,834

Related videos on Youtube

ekassis
Author by

ekassis

Updated on September 18, 2022

Comments

  • ekassis
    ekassis over 1 year

    Replace text between brackets

    Input

    testing on Linux [Remove white space] testing on Linux
    

    Output

    testing on Linux [Removewhitespace] testing on Linux
    

    So, how can we just remove all the white space between the brackets and achieve output as given?

  • derobert
    derobert over 11 years
    @StephaneChazelas well, doesn't assume anything about balanced. It does assume not nested, though. It works just fine on empty brackets (it leaves them alone, which is fine, as they contain no whitespace). It does assume one line (which would be easy to change). And indeed its only space, but trivial to change—as noted in the answer...
  • derobert
    derobert over 11 years
    @StephaneChazelas Indeed, I suppose it does. Will fix. FYI, I don't mind if you edit my answers directly to fix things like that.
  • ekassis
    ekassis over 11 years
    thank you i found a sed solution sed -e 's/([[^]]*)( )/\1/'
  • Stéphane Chazelas
    Stéphane Chazelas over 11 years
    @ekassis, not sure about your sed solution as it seems to have been mangled, but I don't expect it to remove more than one space inside [..] or handle more than one [...] per line. I've added a sed solution to my answer.
  • ekassis
    ekassis over 11 years
    you are right the sed -e 's/([[^]]*)( )/\1/' replace the last space between the brackets. the sed you provided me is the right one. THANK YOU
  • errant.info
    errant.info over 9 years
    I found this query when I was interested in remove spaces between double quotes. The first sed solution will only work when you're using a different char for opening and closing a pair (this issue may effect some of the other solutions provided also).