Limit grep output to short lines
Solution 1
grep
itself only has options for context based on lines. An alternative is suggested by this SU post:
A workaround is to enable the option 'only-matching' and then to use RegExp's power to grep a bit more than your text:
grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath
Of course, if you use color highlighting, you can always grep again to only color the real match:
grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath | grep "WHAT_I_M_SEARCHING"
As another alternative, I'd suggest fold
ing the text and then grepping it, for example:
fold -sw 80 input.txt | grep ...
The -s
option will make fold
push words to the next line instead of breaking in between.
Or use some other way to split the input in lines based on the structure of your input. (The SU post, for example, dealt with JSON, so using jq
etc. to pretty-print and grep
... or just using jq
to do the filtering by itself ... would be better than either of the two alternatives given above.)
This GNU awk method might be faster:
gawk -v n=50 -v RS='MyClassName' '
FNR > 1 { printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)}
{p = substr($0, length - n); prt = RT}
' input.txt
- Tell awk to split records on the pattern we're interested in (
-v RS=...
), and the number of characters in context (-v n=...
) - Each record after the first record (
FNR > 1
) is one where awk found a match for the pattern. - So we print
n
trailing characters from the previous line (p
) andn
leading characters from the current line (substr($0, 0, n)
), along with the matched text for the previous line (which isprt
)- we set
p
andprt
after printing, so the value we set is used by the next line -
RT
is a GNUism, that's why this is GNU awk-specific.
- we set
For recursive search, maybe:
find . -type f -exec gawk -v n=50 -v RS='MyClassName' 'FNR>1{printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)} {p = substr($0, length-n); prt = RT}' {} +
Solution 2
Using only-matching in combination with some other options(see below), might be very close to what you are seeking, without the processing overhead of regex mentioned in the other answer
grep -RnHo 'MyClassName'
- n numeric output, show the line number of the match
- H filename, show the filename at the start of the line of the match
- o only matches, only show the matched string, not the whole line
Socrates
Updated on September 18, 2022Comments
-
Socrates over 1 year
I often use grep to find files having a certain entry like this:
grep -R 'MyClassName'
The good thing is that it returns the files, their contents and marks the found string in red. The bad thing is that I also have huge files where the entire text is written in one big single line. Now grep outputs too much when finding text within those big files. Is there a way to limit the output to for instance 5 words to the left and to the right? Or maybe limit the output to 30 letters to the left and to the right?
-
Rinzwind about 6 yearsPipe your results thru
cut
-
Sergiy Kolodyazhnyy about 6 yearsSo, let's say the pattern you're looking for is at position 50, but you said you only want 30 letters.What do you want to do then ? Ignore that line or also include it into output but trim it ? What exactly do you want to limit - the search or the lines themselves ?
-
Socrates about 6 years@Rinzwind I don't quite understand what you want to achieve with
cut
, as it only splits by delimiter or by count of characters. Though when I find a line withMyClassName
it may be anywhere in the line and not always at the same position. Furthermore, there may be a variation of characters in the front and the back of it, which breaks the possibility to split by delimiter. -
Socrates about 6 years@SergiyKolodyazhnyy When a positive line with
MyClassName
has been found, I want to get as a result the file name and the x characters to the left and to the right. x is any number I provide, for instance 30. The rest of the file contents shall be ignored. This is to get a context to the matching files and limit the overload. -
Rinzwind about 6 yearsFrom the manual: "The -f switch of the cut command is the n-TH element separated by your delimiter".
-
Socrates about 6 years@Rinzwind As far as I understand the
-f
switch ofcut
splits using a delimiter, like a space or a tab or any given one character. For me, needing the output for a little context of that has been found within a file, this doesn't provide any useful information. I can only output the word I'm searching or maybe a word before it or after it. But that only works if a delimiter could be found and used. I though don't know upfront what type of characters are infront or after the searched word. -
Rinzwind about 6 yearsOr a custom delimiter.
-
Socrates about 6 years@Rinzwind What type of custom delimiter would you suggest with
cut
if there are three files with the following input:oiadfaosuoianavMyClassNameionaernaldfajd
and/(/&%%§%/(§(/MyClassName&((/$/$/(§/$&
andpublic class MyClassName { public static void main(String[] args) { } }
? -
G-Man Says 'Reinstate Monica' about 6 yearsVery similar to How to make grep output fit screen's width of characters on U&L.
-
-
Socrates about 6 yearsOk, it works. Seems Regex is a valid approach, so thanks for that. The processing time is quite big though. Without Regex as in my above post it takes 4.912s and with Regex as in your post it takes 3m39.312s.
-
muru about 6 yearsYes, the linked post also says as much.
-
Socrates about 6 yearsWhile it is true that the result is found much faster, there is missing info. The file path is shown, the line number is shown, but the text output is only my initial search
MyClassName
. Hence, the context is missing. -
Socrates about 6 years
grep -RnHo "MyClassName"
andgrep -Rno "MyClassName"
have the same output. -
Robert Riedl about 6 years@Socrates output is not the same without H in the same directory
-
Melebius about 6 yearsThe
-o
flag might be interesting if the regex had some variable part. For a fixed string, it’s useless to print it each time. OP is most likely interested in the near context. -
Socrates about 6 years@RobertRiedl At least in my local recursive search putting both outputs side by side, there is no visual difference. According to
man grep
, if there is more than one file, this option is set by default: "-H, --with-filename Print the file name for each match. This is the default when there is more than one file to search." -
muru about 6 years@Socrates see if the awk method I added above performs better
-
Melebius about 6 yearsThe
fold
method can be used only if you are sure that the searched string does not appear at the border, otherwise it would get hidden bygrep
. -
Robert Riedl about 6 years@Socrates, true - context is missing, but I thought that was the point ? Limit the output ? You can add context again by adding the lines before (
-B 1
) or after (-A 1
). Sorry that I could not be of more help. -
Socrates about 6 years@muru Thanks for your suggestion with
gawk
. Unfortunately, the suggested command withfind
outputs random stuff and no file names, when executed on my system. Furthermore, I'm not fluent enough inawk
to properly analyse the command. Currently, Regex in combination withgrep
solves the matter maybe not fast, but reliable. Again, thanks a lot. -
Socrates about 6 years@RobertRiedl Both
-A
and-B
doesn't do the trick. Out ofman grep
for-A
and-B
: "With the -o or --only-matching option, this has no effect and a warning is given." Anyway, thank you for your input. Greatly appreciated. -
muru about 6 years@Socrates I think I managed to fix the awk command. My mental model was wrong about which line's
RT
and prefix, etc. were to be used. -
Socrates about 6 years@muru Ok, that fixed it. Much faster than with Regex and grep. A bit more verbose as well. Thanks for your help. Greatly appreciated.