Bash script: split word on each letter

split words command-line

12,043

Solution 1

I would use grep:

$ grep -o . <<<"StackOver"
S
t
a
c
k
O
v
e
r

or sed:

$ sed 's/./&\n/g' <<<"StackOver"
S
t
a
c
k
O
v
e
r

And if empty space at the end is an issue:

sed 's/\B/&\n/g' <<<"StackOver"

All of that assuming GNU/Linux.

Solution 2

You may want to break on grapheme clusters instead of characters if the intent is to print text vertically. For instance with a e with an acute accent:

With grapheme clusters (e with its acute accent would be one grapheme cluster):
```
$ perl -CLAS -le 'for (@ARGV) {print for /\X/g}' $'Ste\u301phane'
S
t
é
p
h
a
n
e
```
(or grep -Po '\X' with GNU grep built with PCRE support)

With characters (here with GNU grep):

$ printf '%s\n' $'Ste\u301phane' | grep -o .
S
t
e

p
h
a
n
e

fold is meant to break on characters, but GNU fold doesn't support multi-byte characters, so it breaks on bytes instead:
```
$ printf '%s\n' $'Ste\u301phane' | fold -w 1
S
t
e
�
�
p
h
a
n
e
```

On StackOver which only consists of ASCII characters (so one byte per character, one character per grapheme cluster), all three would give the same result.

Solution 3

If you have perl6 in your box:

$ perl6 -e 'for @*ARGS -> $w { .say for $w.comb }' 'cường'       
c
ư
ờ
n
g

work regardless of your locale.

Solution 4

With many awk versions

awk -F '' -v OFS='\n' '{$1=$1};1' <<<'StackOver'

Solution 5

You can use the fold (1) command. It is more efficient than grep and sed.

$ time grep -o . <bigfile >/dev/null

real    0m3.868s
user    0m3.784s
sys     0m0.056s
$ time fold -b1 <bigfile >/dev/null

real    0m0.555s
user    0m0.528s
sys     0m0.016s
$

One significant difference is that fold will reproduce empty lines in the output:

$ grep -o . <(printf "A\nB\n\nC\n\n\nD\n")
A
B
C
D
$ fold -b1 <(printf "A\nB\n\nC\n\n\nD\n")
A
B

C


D
$

View more solutions

12,043

Sijaan Hallak

Updated on September 18, 2022

Comments

Sijaan Hallak over 1 year
How can I split a word's letters, with each letter in a separate line?

For example, given "StackOver" I would like to see
```
S
t
a
c
k
O
v
e
r
```
I'm new to bash so I have no clue where to start.
Sijaan Hallak over 8 years

grep -o . <<< ¿¿¿ .. -o searches for the PATTERN provided right? and what it does here in your command?
jimmij over 8 years

@SijaanHallak grep searches for pattern, an in this example it searches for every character . and prints it in the separate line. See also sed solution.
Sijaan Hallak over 8 years

Thanks! so this "." dot means every character.. Can you please give me a link where I can read about things such as this dot? or what ar these things called?
jimmij over 8 years

I'm surprised grep -Po doesn't do what one would expect (like grep -P does).
Stéphane Chazelas over 8 years

Note that both -o and \n are a GNU extension. <<< is a zsh extension (also available in recent versions of ksh93 and the GNU shell (bash)).
Stéphane Chazelas over 8 years

@jimmij, what do you mean? grep -Po . finds characters (and a combining acute accent following a newline character is invalid), and grep -Po '\X' finds graphem clusters for me. You may need a recent version of grep and/or PCRE for it to work properly (or try grep -Po '(*UTF8)\X')
jimmij over 8 years

@SijaanHallak The best manual you have already on you computer, just run man grep and then just look for the chapter "REGULAR EXPRESSIONS" (if that is what you are interested in).
Avinash Raj over 8 years

Second answer would produce a new line after last...
cuonglm over 8 years

NP, should we add a note about the locale?
Sijaan Hallak over 8 years

@jimmij I cant find any help on what <<< really does! any help?
Sijaan Hallak over 8 years

This won't help as it prints a new line at the end
jimmij over 8 years

@SijaanHallak This is so called Here string, grosso modo equivalent of echo foo | ... just less typing. See tldp.org/LDP/abs/html/x17837.html
kay over 8 years

Does not work for combining characters like Stéphane Chazelas answer, but with proper normalization this should not matter.
mikeserv over 8 years

@Kay - it's works for combining characters if you want it to - that's what sed scripts are for. i'm not likely to write one right about now - im pretty sleepy. it's really useful, though, when reading a terminal.
mikeserv over 8 years

@cuonglm - if you like. it should just work for the locale, given a sane libc, though.
Sijaan Hallak over 8 years

@jimmij the second solution here seems to have a problem. it prints a new line at the end! I changed it to this sed -e 's/./\n&/g' <<< "$1" But this prints a new line at the beggining.. any suggestion how to overcome this?
jimmij over 8 years

@SijaanHallak change . to \B (doesn't match on word boundary).
Sijaan Hallak over 8 years

@jimmij \B will not work as it prints "Stack Over" -> the "O" will be printed near the letter "k" at the same line and then it does \n
jpmc26 over 8 years

@SijaanHallak These might be helpful: joelonsoftware.com/articles/Unicode.html, eev.ee/blog/2015/09/12/dark-corners-of-unicode
Stéphane Chazelas over 8 years

Note that dd will break multibyte characters, so the output will not be text anymore so the behaviour of sed will be unspecified as per POSIX.
mikeserv over 8 years

@StéphaneChazelas - do you have a link to reference that statement? a NUL can't occur in a multibyte character, and a dot can only match a whole character which is not NUL, and it has worked with every sed i've tried. how could it not work?
mikeserv over 8 years

oh wait - you mean because input isn't a text file. possibly, but sed is spec'd to handle conditions which exceed/break text file specs, too, such as 4k pattern spaces scripts which is well beyond line max. its also spec'd to evaluate chars bytewise w/ l - even when a single char is multiple bytes. i think the text file restriction for sed is probably based on the NUL prohibition - many seds replace delimiter in their scripts w/ NULs, and ive never managed to seek past a NUL in pattern space with heirloom sed except with D and G.
mikeserv over 8 years

@SijaanHallak - you can drop the second sed like: sed -et -e's/./\n&/g;//D'
Yunus almost 8 years

since each byte have a width=1 the result will be the same !
VocalFan almost 8 years

So how is this not a duplicate of the earlier answer?
Yunus almost 8 years

because it shows tha same cmd with different argyment , and that is nice to know .
eruve about 5 years

Great! But on my version of nAWK ("One True AWK") that doesn't work. However this does the trick: awk -v FS='' -v OFS='\n' '{$1=$1};1' (wondering if that's more portable since -F '' might yield the ERE: //)
done almost 3 years

This removes white space from the original string.
done almost 3 years

An eval could be a big risk, a double eval is even more risky. Specially with arbitrary input from $s. Just saying !!
done almost 3 years

Are you claiming that $'e\u301' is equivalent/equal to é ?
Stéphane Chazelas almost 3 years

@Isaac, no, I'm not claiming any such thing though there are some definitions of "equivalent" for which that would be true.
done almost 3 years

Your description seems to imply that because Perl is able to join together characters and accents (much like a text editor join them to select an specific glyph) other software should be able also. But no, not all programs are text editors, Nor all utilities understand the complex (specially in Hangul) set of rules to join some individual Unicode codepoints (unicode.org/reports/tr29 and search for Devanagari kshi). So, no, nor grep, sed or fold understand any of this issue (yet).