How to remove all white spaces just between brackets [] using Unix tools?

text-processing awk sed perl

5,834

Solution 1

If the [, ] are balanced and not nested, you could use GNU awk as in:

gawk -v RS='[][]' '
   NR % 2 == 0 {gsub(/\s/,"")}
   {printf "%s", $0 RT}'

That is use [ and ] as the record separators instead of the newline character and remove blanks on every other records only.

With sed, with the additional requirement that there be no newline character inside [...]:

sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'

If they are balanced but may be nested as in blah [blih [1] bluh] asd, then you could use perl's recursion regexp operators like:

perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'

Another approach, which would scale to very large files would be to use the (?{...}) perl regexp operator to keep track of the bracket depth like in:

perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
  {"$1".($l>0?"":$2)}gse'

Actually, you can also process the input one character at a time like:

perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'

That approach can be implemented with POSIX tools:

od -A n -vt u1 |
  tr -cs 0-9 '[\n*]' |
  awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
       !NF{next}; l>0 && $0 in b {next}
       $0 == "91" {l++}; $0 == "93" {l--}
       {printf "%c", $0}'

With sed (assuming no newline inside the [...]):

sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
    -e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
    -e 's/_c/]/g;s/_o/[/g;s/_u/_/g'

Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF...) spacing character in the ASCII charset. Depending on your locale, others might get included.

Solution 2

Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)

perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'

That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:

s{
    (\[ .*? \])         # search for [ ... ] block, capture (as $1)
}{
    $1 =~ y/ //dr       # delete spaces. you could add in other whitespace here, too
                        # d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex

5,834

ekassis

Updated on September 18, 2022

Comments

ekassis over 1 year
Replace text between brackets

Input
```
testing on Linux [Remove white space] testing on Linux
```
Output
```
testing on Linux [Removewhitespace] testing on Linux
```
So, how can we just remove all the white space between the brackets and achieve output as given?
- ekoeppen over 11 years
  
  Remove comma between quotes only might help.
- manatwork over 11 years
  
  Replace text between brackets might also help.
derobert over 11 years

@StephaneChazelas well, doesn't assume anything about balanced. It does assume not nested, though. It works just fine on empty brackets (it leaves them alone, which is fine, as they contain no whitespace). It does assume one line (which would be easy to change). And indeed its only space, but trivial to change—as noted in the answer...
derobert over 11 years

@StephaneChazelas Indeed, I suppose it does. Will fix. FYI, I don't mind if you edit my answers directly to fix things like that.
ekassis over 11 years

thank you i found a sed solution sed -e 's/([[^]]*)( )/\1/'
Stéphane Chazelas over 11 years

@ekassis, not sure about your sed solution as it seems to have been mangled, but I don't expect it to remove more than one space inside [..] or handle more than one [...] per line. I've added a sed solution to my answer.
ekassis over 11 years

you are right the sed -e 's/([[^]]*)( )/\1/' replace the last space between the brackets. the sed you provided me is the right one. THANK YOU
errant.info over 9 years

I found this query when I was interested in remove spaces between double quotes. The first sed solution will only work when you're using a different char for opening and closing a pair (this issue may effect some of the other solutions provided also).