Parse lines with specific pattern out of file
Solution 1
Using grep
:
$ grep "^\[[0-9]\+\]:" file.txt
[25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001
[29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847)
To save the output in a file (output.txt
):
grep "^\[[0-9]\+\]:" file.txt > output.txt
Using python
:
#!/usr/bin/env python2
import re
with open('/path/to/file.txt') as f:
print '\n'.join([line.rstrip() for line in f if re.search(r'^\[\d+\]:', line)])
Solution 2
The perl
way:
perl -ne 'print "$1\n" if /^(\[[0-9]*\]:.*)/' testdata > out
The awk
way:
awk 'match($0, /^\[[0-9]*\]:/)' testdata > out
Output for both commands
[25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001
[29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847)
Solution 3
This task is perfectly suited for grep
, because you're just checking which lines contain a match for a pattern and printing the lines that do.
heemayl's way is excellent. Here's another that's similar but uses Perl regular expression syntax (which GNU grep supports, with -P
), for a shorter and slightly simpler pattern:
grep -P '\[\d+\]:' infile
That just prints the output, but you can redirect it to outfile
:
grep -P '\[\d+\]:' infile > outfile
In Perl regular expressions, \d
matches any single digit, same as [0-9]
or [[:digit:]]
.
In case you're interested, here's a sed
way:
sed -nr '/^\[[0-9]+\]:/p' infile
sed -nr '/^\[[0-9]+\]:/p' infile > outfile
That checks each line to see if it matches ^\[[0-9]+\]:
. If it does, the sed command p
is used to print the line. The -n
flag prevents any lines from being printed except as provided for explicitly by the sed
script.
Related videos on Youtube
user3069326
Updated on September 18, 2022Comments
-
user3069326 almost 2 years
I have a file that looks roughly like this :
[25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001 [29]:((962:0.000580339,930:0.000580339):0.00543993 ((758:0.000598847,726:0.000598847) position: sites: 5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6 01010010101010101010101010101011111100011 1111010010010101010101010111101000100000 00000000000000011001100101010010101011111
Now I would like to extract only those lines which start with [numeric]: from the file. It is not always only the first two, it could also be the first 7 or 8 or whatever. How would I read in this file and output a file only containing the lines with [numeric]:?
-
Tim about 9 yearsPlease don't damage posts.
-
user3069326 about 9 yearsthis shoudl eb deleted
-
Tim about 9 yearsNo. It. Shouldn't. What is wrong with it? Do not attempt to remove valid posts, it's vandalism. It's also spelt "This should be deleted".
-
-
Eliah Kagan about 9 yearsIn addition to
[non-numeric]
(as you say), this will also print lines containing only[
, or starting with[
with no matching]
to close it or no:
afterwards. (That might or might not be considered desirable.) -
boardrider about 9 years@Eliah, as user3069326 knows the structure of the input file, he's in a good position to ascertain if my suggestion is valid.