extract every nth character from a string
Solution 1
Two lines
Here is a pure-bash
solution that produces a bash
array:
s="100000011100"
array=($(
for ((i=0; i<${#s}-6; i++))
do
echo "${s:$i:1}${s:$((i+6)):1}"
done
))
echo "${array[@]}"
This produces the same output as shown in the question:
10 01 01 01 00 00
The key element here is the use of bash's substring expansion. Bash allows the extraction substrings from a variable, say parameter
, via ${parameter:offset:length}
. In our case, the offset is determined by the loop variable i
and the length is always 1
.
General Solution For Any Number of Lines
Suppose, for example, that our original string has 18 characters and we want to extract the i-th, the i+6-th, and the i+12-th characters for i from 0 to 5. Then:
s="100000011100234567"
array=($(
for ((i=0; i<6; i++))
do
new=${s:$i:1}
for ((j=i+6; j<${#s}; j=j+6))
do
new="$new${s:$j:1}"
done
echo "$new"
done
))
echo "${array[@]}"
This produces the output:
102 013 014 015 006 007
This same code extends to an arbitrary number of 6-character lines. For example, if s
has three lines (18 characters):
s="100000011100234567abcdef"
Then, the output becomes:
102a 013b 014c 015d 006e 007f
Solution 2
Using perl
:
$ echo 100000011100 | perl -nle '
for ($i = 0; $i < length()/2; $i++) {
print substr($_,$i,1), substr($_,$i+6,1);
}
'
10
01
01
01
00
00
It works for two lines. If you want to work with arbitrary of lines, you should process lines directly, instead of building big string. With this input:
1 0 0 0 0 0
0 1 1 1 0 0
0 0 0 0 0 0
Try:
$ perl -anle '
for ($i = 0; $i <= $#F; $i++) {
push @{$h{$i}}, $F[$i];
}
END {
print @{$h{$_}} for keys %h;
}
' file
000
010
000
100
010
010
Solution 3
As a shell solution, getopts
is probably easiest. The thing about getopts
is that it is POSIX-specified to do exactly what you're asking - process a byte-stream in a shell-loop. I know that sounds weird, because, if you're like me before I learned this, you're probably thinking, well, gee, I thought it was supposed to handle command-line switches. Which is true, but so is the first thing. Consider:
-thisisonelongstringconsistingofseparatecommandlineswitches
Yes, getopts
has to handle that. It has to split that char by char in a loop and return to you each character in either the shell variable $OPTARG
or in another that you specify by name all depending on how specific you get when you call it. What's more, it has to return errors in shell variables and save its progress when it does in the shell variable $OPTIND
so it can resume right where it left off if you can somehow address it. And it has to do the whole job without invoking a single subshell.
So let's say we have:
arg=$(seq -s '' 1000); set --
while getopts :0123456789 v -"${arg}"
do [ "$((i=$i+1<6?$i+1:0))" -gt 0 ] ||
set "$@" "$v"
done
Hmmm.... I wonder if it worked?
echo "$((${#arg}/6))" "$#"
482 482
That's nice...
eval '
printf %.1s\\n "${arg#'"$(printf %0$((124*6-1))d | tr 0 \?)"'}" "${124}"'
4
4
So, as you can see, the getopts
command completely set the array for every sixth byte in the string. And it doesn't have to be numbers like this - nor must it even be shell safe characters - and you needn't even specify the target chars as I did above with 01234565789
either. I've tested this repeatedly in a lot of shells and they all just work. There are some quirks - bash
will throw away the first character if it is a whitespace character - dash
accepts the :
colon as a specified parameter even though it is just about the only POSIX specifically forbids. But none of that matters because getopts
still deposits the current opt char's value in $OPTARG
even when it returns you an error (represented by a ? assigned to your specified opt var) and otherwise explictly unsets $OPTARG
unless you've declared an option should have an argument. And the whitespace thing is kind of a good thing - it only discards a leading space, which is excellent, because, when working with unknown values, you can do:
getopts : o -" $unknown_value"
...to kick off the loop without any danger of the first character actually being in your accepted args string - which would result in getopts
sticking the whole thing in $OPTARG
at once - as an argument.
Here's another example:
OPTIND=1
while getopts : o -" $(dd if=/dev/urandom bs=16 count=1 2>/dev/null)"
do printf '\\%04o' "'$OPTARG"; done
\0040\0150\0071\0365\0320\0070\0161\0064\0274\0115\0012\0215\0222\0271\0146\0057\0166
I set $OPTIND=1
in the first line because I just used getopts
and, until you reset it, it expects its next call to continue where it left off - it wants "${arg2}"
in other words. But I don't feel like giving and I'm doing a different thing now, so I let it know by resetting $OPTIND
at which point it's good to go.
In this one I used zsh
- which doesn't quibble about a leading space - and so the first character is octal 40 - the space character. I don't usually use getopts
in that way, though - I usually use it to avoid doing a write()
for each byte and instead assign its output - which comes in a variable - to another shell variable - as I did above with set
after a fashion. Then, when I'm ready I can take the whole string and when I do usually strip the first byte.
Solution 4
sed
is the first thing that pops into my mind.
$ echo 1234567890abcdefghijklmnopqrstuvwxyz | sed 's/.\{5\}\(.\)/\1/g'
6bhntz
Match 5 characters, capture the 6th, and replace them all with that captured character.
This will however have an issue if the length of the string isn't an exact multiple of 6:
$ echo 1234567890abcdefghijklmnopqrstuvwxy | sed 's/.\{5\}\(.\)/\1/g'
6bhntuvwxy
But we can fix this by altering the sed
a little:
$ echo 1234567890abcdefghijklmnopqrstuvwxy | sed 's/.\{1,5\}\(.\{0,1\}\)/\1/g'
6bhnt
Due to the greedy nature of regex, the variable length matches will match as much as they can, and if there's nothing left for the capture, then it doesn't capture, and the characters are just deleted.
Related videos on Youtube
![Laura](https://i.stack.imgur.com/C3mvf.jpg?s=256&g=1)
Laura
Updated on September 18, 2022Comments
-
Laura almost 2 years
I want to classify a data set (which has four classes) using the SVM method. I've done it using the coding below (using a 1 against all). It isn't terribly accurate but I'm thankful for anything at this stage.
http://www.mathworks.co.uk/matlabcentral/fileexchange/39352-multi-class-svm
I was wondering if there is a way to plot the support vectors and training points. I've managed this for a 2 class SVM classification but can't find a way of doing it with >2 classes.
Any help/advice re. how to achieve a semi-pretty graph would be very much appreciated!
-
Ramesh over 9 yearsThanks. But if I have 3 lines, I want to get the substrings as
101
,001
and so on. But I am getting only 2 characters with this approach. -
Ramesh over 9 yearsThanks. So I will still need a loop to store the sed's output to an array variable?
-
phemmer over 9 yearsNot necessarily.
read -a foo < <(echo 1234567890ab | sed 's/.\{1,5\}\(.\{0,1\}\)/\1 /g')
-
Ramesh over 9 yearsthanks. I have the same problem as I have mentioned in Gnouc's answer. If I have 3 lines, it seems to break.
-
Ramesh over 9 yearsyour solution seems to work even if I have more than 2 lines. But I still get the last line as the array value. Instead I want all the values to be stored in my array. This is the command am trying.
read -a array_elements < <(column -s '\t' inputfile | tr -d '[:space:]' | sed 's/.\{5\}\(.\)/\1/g')
-
Ramesh over 9 yearscool. This one works. You should probably add this one as the answer to the other question.
-
John1024 over 9 years@Ramesh See update.
-
Ramesh over 9 yearscool. This one also works as expected. Thanks again for providing a bash solution.
-
Ramesh over 9 yearsI have used your solution to address the original problem of the OP in the question I linked. Thanks again.
-
John1024 over 9 years@Ramesh So, this is combined with
wc -w
. Nice. +1 for your answer to the other question. -
mikeserv over 5 yearsyou gotta give a little to get the little. for shame, me.