Splitting a line in bash based on delimiter with Sed / Regex

12,599

Solution 1

You can do this in just about every text processing tool (many without using regular expressions at all).

ed

If the in-place editing is really important, the canonical correct way is not sed (the stream editor) but ed (the file editor).

ed "$file" << EOF
,s/^[^:]*://g
w
EOF

sed

(Pretty much the same commands as ed, formatted a little differently)

sed 's/^[^:]*://' < "$file" > "$file".new
mv "$file".new "$file"

BASH

This one doesn't cause any new processes to be spawned. (For whatever that's worth.)

while IFS=: read _ time; do
    printf '%s\n' "$time"
done < "$file" > "$file".new
mv "$file".new "$file"

awk

awk -F: 'BEGIN{ OFS=":" } { print $2,$3 }' < "$file" > "$file".new
mv "$file".new "$file"

cut

cut -d: -f2- < "$file" > "$file".new
mv "$file".new "$file"

Solution 2

Since you don't need a regular expression to match a single, known character, consider using instead of .

This simple expression sets : as the d-elimiter and emits f-ields 2, onwards (-):

cut -d: -f2-

Example:

% echo 'time:3:35PM' | cut -d: -f2-
3:35PM

Solution 3

kojiro's answer has a plenty of great alternatives, but you have asked how to do that with regex. Here are some pure regex solutions:

grep -oP '[^:]*:\K.*' file.txt

\K makes it forget everything before the occurrence of \K. But if you know the exact prefix length then you can use lookaround feature:

grep -oP '(?<=^time:).*' file.txt

Note that most of regex implementations do not support these features. You can use it in grep with -P flag and perl itself. I wonder if any other utility supports these.

Solution 4

To remove every instance up to : and including the : you could do..

sed -i.bak 's/^[^:]*://' file.txt

on multiple .txt files

sed -i.bak 's/^[^:]*://' *.txt

The -i option specifies that files are to be edited in-place. By creating a temporary file and sending output to this file rather than to the standard output.

Share:
12,599
horatio1701d
Author by

horatio1701d

Updated on June 30, 2022

Comments

  • horatio1701d
    horatio1701d almost 2 years

    Regex rookie and hoping to change that. I have the following seemingly very simple problem that I cannot figure the correct regex implementation to parse properly. Basically I have a file that has lines that looks like this:

    time:3:35PM
    

    I am just trying to cut out all characters up to and including ONLY FIRST ':' delimiter and keep the rest intact with sed so that I can process on many files with same format. What I am trying to get is this:

    3:35PM
    

    The below is the closest I got but is just using the last occurrence of the delimiter instead of the first.:

    sed 's/.*://'
    

    I have also tried with python but have challenges with applying a python function to iterate through all lines in many files as opposed to just one file.

    Any help would be greatly appreciated.

  • horatio1701d
    horatio1701d over 10 years
    Thank you. I tried this but it's my understanding that you can't use cut for in place editing for multiple files.
  • johnsyweb
    johnsyweb over 10 years
    @prometheus2305: True enough, cut doesn't do in-place editing but neither doe all sed implementations.
  • horatio1701d
    horatio1701d over 10 years
    Nice! Did not even know about 'ed'. Thank you. Can I ask how you might apply the ed option recursively with multiple directories containing multiple files?
  • horatio1701d
    horatio1701d over 10 years
    thank you. I'm definitely focused on in-place editing. So if I'm understanding correctly, I need to go with 'ed'
  • kojiro
    kojiro over 10 years
    @prometheus2305 If your shell can recursively glob, you can just loop over that glob expression, like for f in **/*; do ed "$f" <<< "$edscript"; done. Otherwise you'd need to use find with a script. Something like find . -type f -name "$filepattern" -exec sh -c 'ed "$1" <<< "$edscript"' _ {} \;
  • johnsyweb
    johnsyweb over 10 years
    @prometheus2305: ed would certainly do the job (as illustrated in kojiro's answer). Use whichever tool you're most comfortable using.
  • johnsyweb
    johnsyweb over 10 years
    This is pretty comprehensive. I would do mv "${file}" "${file}.bak"" first and then use < "${file}.bak" > "${file}" for the edit, leaving a backup file in case of accidents.