How to search a text file for strings between two tokens in Ubuntu terminal and save the output?

linux terminal regex grep

25,409

Solution 1

To get the output you show, you could run

grep -Po 'abc \K.*(?= cde)'  file.txt > outfile.txt

The P activates Perl Compatible Regular Expressions which have support for lookarounds and \K which means "discard anything matched up to this point". The -ocauses grep to only print the matched portion of the line so, combined with the positive lookahead (?=cde) and the \K, it will print only the characters between the abc and cde. The > outfile.txt will save the result in the file outfile.txt.

Some other approaches:

sed
```
sed -r 's/.*abc (.+) cde.*/\1/' file.txt > outfile.txt
```
Here, the parentheses capture the pattern and you can then refer to it as \1. The 's/source/replacement/' is the substitution operator and it replaces source with replacement. In this case, it will simply delete everything except whatever is between abc and cde.
perl
```
perl -pe 's/.*abc (.+) cde.*/$1/' file.txt > outfile.txt
```
Same as above really, the -p means "read the input file line by line, apply the script given as -e and print.
awk
```
 awk -F'abc|cde' '{print $2}' file.txt > outfile.txt
```
The idea here is to set the field delimiters to either abc or cde. Assuming these strings are unique in each line, the 2nd field will be the one between the two. This, however, includes leading and trailing spaces, to remove them pass through another awk:
```
awk -F'abc|cde' '{print $2}' file | awk '{print $1}'
```
GNU awk (gawk). The above works perfectly in gawk as well, I am including this in case you want to do something more complex and need to be able to capture patterns.
```
gawk '{print gensub(/.*abc (.*) cde.*/,"\\1", "g",$0);}' file.txt > outfile.txt
```
This is the same basic idea as the perl and sed ones but using gawk's gensub() function.

Solution 2

You want to use a regular expression for that. I'm not that experienced with UNIX regex but something like this should work

grep -Po '(?<=abc ).*(?= cde)' test.txt > output.txt

Edit: The syntax error came from missing quotes, though the old suggestion didn't work you rather want to use (?<=xxx) this is called a zero-width look-behind assertion and without < you do a look ahead. -P to activate perl style regex and -o to only print the matches.

Tried this and working fine with a text file containing abc mymatch cde.

25,409

Blue

Updated on September 18, 2022

Comments

Blue over 1 year
How can I search a text file for this pattern in Ubuntu terminal and save the output as a text file?

I'm looking for everything between the string "abc" and the string "cde" in a long list of data.

For example:
```
blah blah abc fkdljgn cde blah
blah blah blah blah blah abc skdjfn cde blah
```
In the example above I would be looking for an output such as this:
```
fkdljgn
skdjfn
```
It is important that I can also save the data output as a text file.

Can I use grep or agrep and if so, what is the format?
Blue almost 10 years

Well, sure, I also do believe something along the same line might work, but I'm looking for an exact answer. Inputting your suggestion would generate a "bash: syntax error near unexpected token `('" error
timonsku almost 10 years

Right the quotes were missing, see my edit.
Octavia Togami almost 10 years

I'm not sure your original grep pattern matches all of the other patterns you use. '.*(?=cde)' matches chars then 'cde', but '(.+) cde' matches chars, then a space, then 'cde'.
terdon almost 10 years

@KenzieTogami yes indeed, the grep one will have a trailing space. I realized after posting the grep one that the OP probably doesn't want the spaces so I removed them from the alternatives and forgot to do so in the grep. Thanks, I fixed it now.