Splitting a line in bash based on delimiter with Sed / Regex

regex bash sed

12,599

Solution 1

You can do this in just about every text processing tool (many without using regular expressions at all).

ed

If the in-place editing is really important, the canonical correct way is not sed (the stream editor) but ed (the file editor).

ed "$file" << EOF
,s/^[^:]*://g
w
EOF

sed

(Pretty much the same commands as ed, formatted a little differently)

sed 's/^[^:]*://' < "$file" > "$file".new
mv "$file".new "$file"

BASH

This one doesn't cause any new processes to be spawned. (For whatever that's worth.)

while IFS=: read _ time; do
    printf '%s\n' "$time"
done < "$file" > "$file".new
mv "$file".new "$file"

awk

awk -F: 'BEGIN{ OFS=":" } { print $2,$3 }' < "$file" > "$file".new
mv "$file".new "$file"

cut

cut -d: -f2- < "$file" > "$file".new
mv "$file".new "$file"

Solution 2

Since you don't need a regular expression to match a single, known character, consider using cut instead of sed.

This simple expression sets : as the d-elimiter and emits f-ields 2, onwards (-):

cut -d: -f2-

Example:

% echo 'time:3:35PM' | cut -d: -f2-
3:35PM

Solution 3

kojiro's answer has a plenty of great alternatives, but you have asked how to do that with regex. Here are some pure regex solutions:

grep -oP '[^:]*:\K.*' file.txt

\K makes it forget everything before the occurrence of \K. But if you know the exact prefix length then you can use lookaround feature:

grep -oP '(?<=^time:).*' file.txt

Note that most of regex implementations do not support these features. You can use it in grep with -P flag and perl itself. I wonder if any other utility supports these.

Solution 4

To remove every instance up to : and including the : you could do..

sed -i.bak 's/^[^:]*://' file.txt

on multiple .txt files

sed -i.bak 's/^[^:]*://' *.txt

The -i option specifies that files are to be edited in-place. By creating a temporary file and sending output to this file rather than to the standard output.

View more solutions

12,599

Author by

horatio1701d

Updated on June 30, 2022

Comments

horatio1701d almost 2 years
Regex rookie and hoping to change that. I have the following seemingly very simple problem that I cannot figure the correct regex implementation to parse properly. Basically I have a file that has lines that looks like this:
```
time:3:35PM
```
I am just trying to cut out all characters up to and including ONLY FIRST ':' delimiter and keep the rest intact with sed so that I can process on many files with same format. What I am trying to get is this:
```
3:35PM
```
The below is the closest I got but is just using the last occurrence of the delimiter instead of the first.:
```
sed 's/.*://'
```
I have also tried with python but have challenges with applying a python function to iterate through all lines in many files as opposed to just one file.

Any help would be greatly appreciated.
horatio1701d over 10 years

Thank you. I tried this but it's my understanding that you can't use cut for in place editing for multiple files.
johnsyweb over 10 years

@prometheus2305: True enough, cut doesn't do in-place editing but neither doe all sed implementations.
horatio1701d over 10 years

Nice! Did not even know about 'ed'. Thank you. Can I ask how you might apply the ed option recursively with multiple directories containing multiple files?
horatio1701d over 10 years

thank you. I'm definitely focused on in-place editing. So if I'm understanding correctly, I need to go with 'ed'
kojiro over 10 years

@prometheus2305 If your shell can recursively glob, you can just loop over that glob expression, like for f in **/*; do ed "$f" <<< "$edscript"; done. Otherwise you'd need to use find with a script. Something like find . -type f -name "$filepattern" -exec sh -c 'ed "$1" <<< "$edscript"' _ {} \;
johnsyweb over 10 years

@prometheus2305: ed would certainly do the job (as illustrated in kojiro's answer). Use whichever tool you're most comfortable using.
johnsyweb over 10 years

This is pretty comprehensive. I would do mv "${file}" "${file}.bak"" first and then use < "${file}.bak" > "${file}" for the edit, leaving a backup file in case of accidents.