Parse lines with specific pattern out of file


Solution 1

Using grep :

$ grep "^\[[0-9]\+\]:" file.txt 
[29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847)

To save the output in a file (output.txt):

grep "^\[[0-9]\+\]:" file.txt > output.txt

Using python:

#!/usr/bin/env python2
import re
with open('/path/to/file.txt') as f:
  print '\n'.join([line.rstrip() for line in f if'^\[\d+\]:', line)])

Solution 2

The perl way:

perl -ne 'print "$1\n" if /^(\[[0-9]*\]:.*)/' testdata > out

The awk way:

awk 'match($0, /^\[[0-9]*\]:/)' testdata > out

Output for both commands

[29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847)

Solution 3

This task is perfectly suited for grep, because you're just checking which lines contain a match for a pattern and printing the lines that do.

heemayl's way is excellent. Here's another that's similar but uses Perl regular expression syntax (which GNU grep supports, with -P), for a shorter and slightly simpler pattern:

grep -P '\[\d+\]:' infile

That just prints the output, but you can redirect it to outfile:

grep -P '\[\d+\]:' infile > outfile

In Perl regular expressions, \d matches any single digit, same as [0-9] or [[:digit:]].

In case you're interested, here's a sed way:

sed -nr '/^\[[0-9]+\]:/p' infile
sed -nr '/^\[[0-9]+\]:/p' infile > outfile

That checks each line to see if it matches ^\[[0-9]+\]:. If it does, the sed command p is used to print the line. The -n flag prevents any lines from being printed except as provided for explicitly by the sed script.


Related videos on Youtube

Author by


Updated on September 18, 2022


  • user3069326
    user3069326 almost 2 years

    I have a file that looks roughly like this :

    [29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847)
    sites: 5 4 2 1 3 4 543 5  67 657  78 67 8  5645 6 

    Now I would like to extract only those lines which start with [numeric]: from the file. It is not always only the first two, it could also be the first 7 or 8 or whatever. How would I read in this file and output a file only containing the lines with [numeric]:?

    • Tim
      Tim about 9 years
      Please don't damage posts.
    • user3069326
      user3069326 about 9 years
      this shoudl eb deleted
    • Tim
      Tim about 9 years
      No. It. Shouldn't. What is wrong with it? Do not attempt to remove valid posts, it's vandalism. It's also spelt "This should be deleted".
  • Eliah Kagan
    Eliah Kagan about 9 years
    In addition to [non-numeric] (as you say), this will also print lines containing only [, or starting with [ with no matching ] to close it or no : afterwards. (That might or might not be considered desirable.)
  • boardrider
    boardrider about 9 years
    @Eliah, as user3069326 knows the structure of the input file, he's in a good position to ascertain if my suggestion is valid.