How to remove all white spaces just between brackets [] using Unix tools?
Solution 1
If the [
, ]
are balanced and not nested, you could use GNU awk
as in:
gawk -v RS='[][]' '
NR % 2 == 0 {gsub(/\s/,"")}
{printf "%s", $0 RT}'
That is use [
and ]
as the record separators instead of the newline character and remove blanks on every other records only.
With sed, with the additional requirement that there be no newline character inside [...]
:
sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'
If they are balanced but may be nested as in blah [blih [1] bluh] asd
, then you could use perl
's recursion regexp operators like:
perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'
Another approach, which would scale to very large files would be to use the (?{...})
perl regexp operator to keep track of the bracket depth like in:
perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
{"$1".($l>0?"":$2)}gse'
Actually, you can also process the input one character at a time like:
perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'
That approach can be implemented with POSIX tools:
od -A n -vt u1 |
tr -cs 0-9 '[\n*]' |
awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
!NF{next}; l>0 && $0 in b {next}
$0 == "91" {l++}; $0 == "93" {l--}
{printf "%c", $0}'
With sed
(assuming no newline inside the [...]
):
sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
-e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
-e 's/_c/]/g;s/_o/[/g;s/_u/_/g'
Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF...) spacing character in the ASCII charset. Depending on your locale, others might get included.
Solution 2
Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)
perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'
That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:
s{
(\[ .*? \]) # search for [ ... ] block, capture (as $1)
}{
$1 =~ y/ //dr # delete spaces. you could add in other whitespace here, too
# d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex
Related videos on Youtube
ekassis
Updated on September 18, 2022Comments
-
ekassis over 1 year
Input
testing on Linux [Remove white space] testing on Linux
Output
testing on Linux [Removewhitespace] testing on Linux
So, how can we just remove all the white space between the brackets and achieve output as given?
-
ekoeppen over 11 yearsRemove comma between quotes only might help.
-
manatwork over 11 yearsReplace text between brackets might also help.
-
-
derobert over 11 years@StephaneChazelas well, doesn't assume anything about balanced. It does assume not nested, though. It works just fine on empty brackets (it leaves them alone, which is fine, as they contain no whitespace). It does assume one line (which would be easy to change). And indeed its only space, but trivial to change—as noted in the answer...
-
derobert over 11 years@StephaneChazelas Indeed, I suppose it does. Will fix. FYI, I don't mind if you edit my answers directly to fix things like that.
-
ekassis over 11 yearsthank you i found a sed solution sed -e 's/([[^]]*)( )/\1/'
-
Stéphane Chazelas over 11 years@ekassis, not sure about your sed solution as it seems to have been mangled, but I don't expect it to remove more than one space inside
[..]
or handle more than one[...]
per line. I've added ased
solution to my answer. -
ekassis over 11 yearsyou are right the sed -e 's/([[^]]*)( )/\1/' replace the last space between the brackets. the sed you provided me is the right one. THANK YOU
-
errant.info over 9 yearsI found this query when I was interested in remove spaces between double quotes. The first
sed
solution will only work when you're using a different char for opening and closing a pair (this issue may effect some of the other solutions provided also).