Can grep show only words that match search pattern?
Solution 1
Try grep -o
:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Solution 2
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh
outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w
varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]]
and not its perl equivalent of \w
. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from @AdamRosenfield answer
Solution 3
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
Solution 4
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Solution 5
Just awk
, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
Neil Baldwin
Updated on July 08, 2022Comments
-
Neil Baldwin almost 2 years
Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat some-other-text-file : the quick brown fox yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the the the this thoroughly
Is this possible using grep? Or using another combination of tools?
-
hakish almost 9 yearsDan Midwood solution works perfectly and deserves the credit.
-
Linguist almost 7 yearsIs there a way one can print those matched words without changing the lines. Rather the matched string should remain in the same line?
-
-
ghostdog74 over 14 yearsthat won't give the correct result. also, if using Perl, no need to use grep. do everything in Perl.
-
Admin over 14 yearsThanks for pointing out the error, ghostdog74. I have changed it to print all the words on the line, not just the first.
-
ghostdog74 over 14 yearslike i said, grep is not necessary. perl -n -e'while(/(\s+th\w*)/g) {print "$1\n"}' file
-
Admin over 14 yearsI don't think it's important here to avoid using grep.
-
ghostdog74 over 14 yearsno need cat. tr ' ' '\n' < file | grep th. Slow for big files.
-
ghostdog74 over 14 yearsup to you. i am just illustrating a point. If its not necessary, don't do it. that extra "|" will cost you one process more.
-
Neil Baldwin over 14 yearsThis didn't work. The output still contained the filename and the entire line from the file that contained the match. Anyway, one of the other solutions offered worked. Thanks for the input though.
-
Adam Rosenfield over 14 years@ghostdog74: good point, although if you have more than file, you'll need to use cat. @Neil Baldwin: are you sure you typed it in right? When there's only one input file (stdin in this case), grep doesn't print the filename.
-
Neil Baldwin over 14 years@Adam - yes, sorry Adam, it does work with one file but not multiple.
-
Adam Rosenfield over 14 years@Neil Baldwin: just list all of your files as parameters to cat, it works fine with multiple files
-
Neil Baldwin over 14 years@Adam - so where you've got 'file' in the example, I would just put 'file1 file2 file3' etc. ?
-
tripleee almost 10 yearsThat will still print the entire line containing the match. It constrains the actual match so that
the
no longer matches e.g. "these" or "bathe". -
ksinkar over 9 years@user181548, The grep -o option works only for GNU grep. So if you are not using GNU grep, it might not work for you.
-
Brilliand almost 9 yearsWhat about -o only working in GNU grep (as ksinkar mentioned in a comment on the accepted answer)?
-
PicoCreator almost 9 years@Brilliand hmm, im having trouble finding a linux implementation that does not support '-o', i can look for a work around if i know which platform to check against.
-
Bruce Peterson almost 9 years@pico The
-o
option is not present in the windows grep that installs with the git package (minGW?):"c:\Program Files (x86)\Git\bin\grep" --version grep (GNU grep) 2.4.2
-
PicoCreator almost 9 years@BrucePeterson i have added in AdamRosenfield workaround answer for -o : Help me check if the windows git includes tr / sed and its version. So i can check if this workaround works
-
Bruce Peterson almost 9 years@pico: for GIT: GNU sed version 4.2.1, tr (GNU textutils) 2.0
-
Shayan over 8 yearsor just grep -Eio "th[a-z]+" filename
-
Carcamano over 8 years@ghostdog74 if the slow part is because of
tr
, he could dogrep
first, sotr
would be applied only to matching lines:grep th filename | tr ' ' '\n' | grep th
-
tripleee about 8 yearsThe useless
{1}
quantifiers should be dropped. Or if you want to be consistent,t{1}h{1}e{1}
etc. -
Professor Photon over 7 yearsIn Perl 5.10 or later: perl -nE '@a = /(regexp)/ig; say join "\n", @a'
-
ife over 7 yearscan it print with the same line?
-
Collin Anderson about 7 years
-o
is not valid in linux git either -
LokMac over 6 years@A-B-B It depends if you want to display the name of the matched file or not. I'm not sure under what conditions it does and doesn't display, but I do know that when I used grep across a number of directories it did display the full file path for all matched files, whereas with -h it just displayed the matched words without any specification about which file it is. So, to match the original question, I think it is necessary in certain circumstances.
-
Bishwas Mishra over 6 yearsWhat about display of only the matched group?
-
jeremysprofile almost 6 yearsI needed an explanation for what
"\w*th\w*" *
means, so I figured I'd post.\w
is [_[:alnum:]], so this matches basically any "word" that contains 'th' (since\w
doesn't include space). The * after the quoted section is a glob for which files (i.e., matching all files in this directory) -
tripleee over 5 years
\w
is not generally portable togrep -E
; for proper portability, use the POSIX character class name[[:alnum:]]
instead (or[_[:alnum:]]
if you really want the underscore, too; or trygrep -P
if your platform has that). -
tripleee over 5 yearsThis doesn't seem to add anything over the existing answers from 4+ years before.
-
tripleee over 5 years@CollinAnderson Your comment doesn't really make sense. GNU
grep
and thus pretty much every Linux box hasgrep -o
; there is no-o
option in Git itself, but many Windows victims install agit
package which includes many Unix utilities, including agrep
implementation. -
tripleee over 5 years@BrucePeterson If you genuinely have GNU
grep
2.4.2 then it's frightfully old; the-o
option was introduced in 2.5.1 sometime in 2001 -
tripleee over 5 yearsMaybe see also Useless use of
cat
? -
tripleee over 5 yearsThis doesn't work; it will only ever find
th
because you requested the shortest possible repetition of the wildcard. -
Collin Anderson over 5 years@tripleee I meant
git grep
-
Ken Williams over 5 years@tripleee - it won't have that problem, because there's a space included at the end of the regex. However, it will miss words that don't have spaces after them, e.g. at the ends of lines.
-
El Ronnoco over 5 years@A-B-B Given the desired output shown by the OP the
-h
is entirely necessary I would say.. ? -
Abhinandan prasad about 5 years@tripleee I found my approach is better and simple so I posted this.
-
neverMind9 about 5 yearsMuch better than
abk
. -
Nathan McKaskle over 2 years-o only gives me exactly what I searched for, I need the whole line. Wtf? It's like trying to work with a genie that takes everything too literally.
-
econometrica_33 about 2 yearsI'm not sure why this answer is downvoted. I was using ripgrep looking for the answer to the same question and simply by adding the
-o
option I got exactly as desired. -
Kasthuri Shravankumar almost 2 yearstac file.log | grep "In msg::" | grep -oh "templateId=.*, temp"