Splitting a line in bash based on delimiter with Sed / Regex
Solution 1
You can do this in just about every text processing tool (many without using regular expressions at all).
ed
If the in-place editing is really important, the canonical correct way is not sed (the stream editor) but ed
(the file editor).
ed "$file" << EOF
,s/^[^:]*://g
w
EOF
sed
(Pretty much the same commands as ed, formatted a little differently)
sed 's/^[^:]*://' < "$file" > "$file".new
mv "$file".new "$file"
BASH
This one doesn't cause any new processes to be spawned. (For whatever that's worth.)
while IFS=: read _ time; do
printf '%s\n' "$time"
done < "$file" > "$file".new
mv "$file".new "$file"
awk
awk -F: 'BEGIN{ OFS=":" } { print $2,$3 }' < "$file" > "$file".new
mv "$file".new "$file"
cut
cut -d: -f2- < "$file" > "$file".new
mv "$file".new "$file"
Solution 2
Since you don't need a regular expression to match a single, known character, consider using cut instead of sed.
This simple expression sets :
as the d
-elimiter and emits f
-ields 2
, onwards (-
):
cut -d: -f2-
Example:
% echo 'time:3:35PM' | cut -d: -f2-
3:35PM
Solution 3
kojiro's answer has a plenty of great alternatives, but you have asked how to do that with regex
. Here are some pure regex solutions:
grep -oP '[^:]*:\K.*' file.txt
\K
makes it forget everything before the occurrence of \K
.
But if you know the exact prefix length then you can use lookaround feature:
grep -oP '(?<=^time:).*' file.txt
Note that most of regex implementations do not support these features. You can use it in grep
with -P
flag and perl
itself. I wonder if any other utility supports these.
Solution 4
To remove every instance up to :
and including the :
you could do..
sed -i.bak 's/^[^:]*://' file.txt
on multiple .txt
files
sed -i.bak 's/^[^:]*://' *.txt
The -i
option specifies that files are to be edited in-place. By creating a temporary file and sending output to this file rather than to the standard output.
horatio1701d
Updated on June 30, 2022Comments
-
horatio1701d almost 2 years
Regex rookie and hoping to change that. I have the following seemingly very simple problem that I cannot figure the correct regex implementation to parse properly. Basically I have a file that has lines that looks like this:
time:3:35PM
I am just trying to cut out all characters up to and including ONLY FIRST ':' delimiter and keep the rest intact with sed so that I can process on many files with same format. What I am trying to get is this:
3:35PM
The below is the closest I got but is just using the last occurrence of the delimiter instead of the first.:
sed 's/.*://'
I have also tried with python but have challenges with applying a python function to iterate through all lines in many files as opposed to just one file.
Any help would be greatly appreciated.
-
horatio1701d over 10 yearsThank you. I tried this but it's my understanding that you can't use cut for in place editing for multiple files.
-
johnsyweb over 10 years
-
horatio1701d over 10 yearsNice! Did not even know about 'ed'. Thank you. Can I ask how you might apply the ed option recursively with multiple directories containing multiple files?
-
horatio1701d over 10 yearsthank you. I'm definitely focused on in-place editing. So if I'm understanding correctly, I need to go with 'ed'
-
kojiro over 10 years@prometheus2305 If your shell can recursively glob, you can just loop over that glob expression, like
for f in **/*; do ed "$f" <<< "$edscript"; done
. Otherwise you'd need to usefind
with a script. Something likefind . -type f -name "$filepattern" -exec sh -c 'ed "$1" <<< "$edscript"' _ {} \;
-
johnsyweb over 10 years@prometheus2305: ed would certainly do the job (as illustrated in kojiro's answer). Use whichever tool you're most comfortable using.
-
johnsyweb over 10 yearsThis is pretty comprehensive. I would do
mv "${file}" "${file}.bak"
" first and then use< "${file}.bak" > "${file}"
for the edit, leaving a backup file in case of accidents.