Bash script: split word on each letter
Solution 1
I would use grep
:
$ grep -o . <<<"StackOver"
S
t
a
c
k
O
v
e
r
or sed
:
$ sed 's/./&\n/g' <<<"StackOver"
S
t
a
c
k
O
v
e
r
And if empty space at the end is an issue:
sed 's/\B/&\n/g' <<<"StackOver"
All of that assuming GNU/Linux.
Solution 2
You may want to break on grapheme clusters instead of characters if the intent is to print text vertically. For instance with a e
with an acute accent:
-
With grapheme clusters (
e
with its acute accent would be one grapheme cluster):$ perl -CLAS -le 'for (@ARGV) {print for /\X/g}' $'Ste\u301phane' S t é p h a n e
(or
grep -Po '\X'
with GNU grep built with PCRE support) -
With characters (here with GNU
grep
):$ printf '%s\n' $'Ste\u301phane' | grep -o . S t e p h a n e
-
fold
is meant to break on characters, but GNUfold
doesn't support multi-byte characters, so it breaks on bytes instead:$ printf '%s\n' $'Ste\u301phane' | fold -w 1 S t e � � p h a n e
On StackOver which only consists of ASCII characters (so one byte per character, one character per grapheme cluster), all three would give the same result.
Solution 3
If you have perl6 in your box:
$ perl6 -e 'for @*ARGS -> $w { .say for $w.comb }' 'cường'
c
ư
ờ
n
g
work regardless of your locale.
Solution 4
With many awk
versions
awk -F '' -v OFS='\n' '{$1=$1};1' <<<'StackOver'
Solution 5
You can use the fold (1)
command. It is more efficient than grep
and sed
.
$ time grep -o . <bigfile >/dev/null
real 0m3.868s
user 0m3.784s
sys 0m0.056s
$ time fold -b1 <bigfile >/dev/null
real 0m0.555s
user 0m0.528s
sys 0m0.016s
$
One significant difference is that fold will reproduce empty lines in the output:
$ grep -o . <(printf "A\nB\n\nC\n\n\nD\n")
A
B
C
D
$ fold -b1 <(printf "A\nB\n\nC\n\n\nD\n")
A
B
C
D
$
Related videos on Youtube
Sijaan Hallak
Updated on September 18, 2022Comments
-
Sijaan Hallak over 1 year
How can I split a word's letters, with each letter in a separate line?
For example, given
"StackOver"
I would like to seeS t a c k O v e r
I'm new to bash so I have no clue where to start.
-
Sijaan Hallak over 8 yearsgrep -o . <<< ¿¿¿ .. -o searches for the PATTERN provided right? and what it does here in your command?
-
jimmij over 8 years@SijaanHallak
grep
searches for pattern, an in this example it searches for every character.
and prints it in the separate line. See alsosed
solution. -
Sijaan Hallak over 8 yearsThanks! so this "." dot means every character.. Can you please give me a link where I can read about things such as this dot? or what ar these things called?
-
jimmij over 8 yearsI'm surprised
grep -Po
doesn't do what one would expect (likegrep -P
does). -
Stéphane Chazelas over 8 yearsNote that both
-o
and\n
are a GNU extension.<<<
is a zsh extension (also available in recent versions of ksh93 and the GNU shell (bash)). -
Stéphane Chazelas over 8 years@jimmij, what do you mean?
grep -Po .
finds characters (and a combining acute accent following a newline character is invalid), andgrep -Po '\X'
finds graphem clusters for me. You may need a recent version of grep and/or PCRE for it to work properly (or trygrep -Po '(*UTF8)\X'
) -
jimmij over 8 years@SijaanHallak The best manual you have already on you computer, just run
man grep
and then just look for the chapter "REGULAR EXPRESSIONS" (if that is what you are interested in). -
Avinash Raj over 8 yearsSecond answer would produce a new line after last...
-
cuonglm over 8 yearsNP, should we add a note about the locale?
-
Sijaan Hallak over 8 years@jimmij I cant find any help on what <<< really does! any help?
-
Sijaan Hallak over 8 yearsThis won't help as it prints a new line at the end
-
jimmij over 8 years@SijaanHallak This is so called
Here string
, grosso modo equivalent ofecho foo | ...
just less typing. See tldp.org/LDP/abs/html/x17837.html -
kay over 8 yearsDoes not work for combining characters like Stéphane Chazelas answer, but with proper normalization this should not matter.
-
mikeserv over 8 years@Kay - it's works for combining characters if you want it to - that's what
sed
scripts are for. i'm not likely to write one right about now - im pretty sleepy. it's really useful, though, when reading a terminal. -
mikeserv over 8 years@cuonglm - if you like. it should just work for the locale, given a sane libc, though.
-
Sijaan Hallak over 8 years@jimmij the second solution here seems to have a problem. it prints a new line at the end! I changed it to this
sed -e 's/./\n&/g' <<< "$1"
But this prints a new line at the beggining.. any suggestion how to overcome this? -
jimmij over 8 years@SijaanHallak change
.
to\B
(doesn't match on word boundary). -
Sijaan Hallak over 8 years@jimmij \B will not work as it prints "Stack Over" -> the "O" will be printed near the letter "k" at the same line and then it does
\n
-
jpmc26 over 8 years@SijaanHallak These might be helpful: joelonsoftware.com/articles/Unicode.html, eev.ee/blog/2015/09/12/dark-corners-of-unicode
-
Stéphane Chazelas over 8 yearsNote that
dd
will break multibyte characters, so the output will not be text anymore so the behaviour of sed will be unspecified as per POSIX. -
mikeserv over 8 years@StéphaneChazelas - do you have a link to reference that statement? a NUL can't occur in a multibyte character, and a dot can only match a whole character which is not NUL, and it has worked with every
sed
i've tried. how could it not work? -
mikeserv over 8 yearsoh wait - you mean because input isn't a text file. possibly, but sed is spec'd to handle conditions which exceed/break text file specs, too, such as 4k pattern spaces scripts which is well beyond line max. its also spec'd to evaluate chars bytewise w/
l
- even when a single char is multiple bytes. i think the text file restriction for sed is probably based on the NUL prohibition - many seds replacedelimiter
in their scripts w/ NULs, and ive never managed to seek past a NUL in pattern space with heirloom sed except with D and G. -
mikeserv over 8 years@SijaanHallak - you can drop the second
sed
like:sed -et -e's/./\n&/g;//D'
-
Yunus almost 8 yearssince each byte have a width=1 the result will be the same !
-
VocalFan almost 8 yearsSo how is this not a duplicate of the earlier answer?
-
Yunus almost 8 yearsbecause it shows tha same cmd with different argyment , and that is nice to know .
-
eruve about 5 yearsGreat! But on my version of nAWK ("One True AWK") that doesn't work. However this does the trick:
awk -v FS='' -v OFS='\n' '{$1=$1};1'
(wondering if that's more portable since-F ''
might yield the ERE://
) -
done almost 3 yearsThis removes white space from the original string.
-
done almost 3 yearsAn eval could be a big risk, a double
eval
is even more risky. Specially with arbitrary input from$s
. Just saying !! -
done almost 3 yearsAre you claiming that
$'e\u301'
is equivalent/equal toé
? -
Stéphane Chazelas almost 3 years@Isaac, no, I'm not claiming any such thing though there are some definitions of "equivalent" for which that would be true.
-
done almost 3 yearsYour description seems to imply that because Perl is able to join together characters and accents (much like a text editor join them to select an specific glyph) other software should be able also. But no, not all programs are text editors, Nor all utilities understand the complex (specially in Hangul) set of rules to join some individual Unicode codepoints (unicode.org/reports/tr29 and search for Devanagari kshi). So, no, nor grep, sed or fold understand any of this issue (yet).