Limit grep output to short lines

command-line text-processing grep

8,853

Solution 1

grep itself only has options for context based on lines. An alternative is suggested by this SU post:

A workaround is to enable the option 'only-matching' and then to use RegExp's power to grep a bit more than your text:
grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath
Of course, if you use color highlighting, you can always grep again to only color the real match:
grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}"  ./filepath | grep "WHAT_I_M_SEARCHING"

As another alternative, I'd suggest folding the text and then grepping it, for example:

fold -sw 80 input.txt | grep ...

The -s option will make fold push words to the next line instead of breaking in between.

Or use some other way to split the input in lines based on the structure of your input. (The SU post, for example, dealt with JSON, so using jq etc. to pretty-print and grep ... or just using jq to do the filtering by itself ... would be better than either of the two alternatives given above.)

This GNU awk method might be faster:

gawk -v n=50 -v RS='MyClassName' '
  FNR > 1 { printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)}
  {p = substr($0, length - n); prt = RT}
' input.txt

Tell awk to split records on the pattern we're interested in (-v RS=...), and the number of characters in context (-v n=...)
Each record after the first record (FNR > 1) is one where awk found a match for the pattern.
So we print n trailing characters from the previous line (p) and n leading characters from the current line (substr($0, 0, n)), along with the matched text for the previous line (which is prt)
- we set p and prt after printing, so the value we set is used by the next line
- RT is a GNUism, that's why this is GNU awk-specific.

For recursive search, maybe:

find . -type f -exec gawk -v n=50 -v RS='MyClassName' 'FNR>1{printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)} {p = substr($0, length-n); prt = RT}' {} +

Solution 2

Using only-matching in combination with some other options(see below), might be very close to what you are seeking, without the processing overhead of regex mentioned in the other answer

grep -RnHo 'MyClassName'

n numeric output, show the line number of the match
H filename, show the filename at the start of the line of the match
o only matches, only show the matched string, not the whole line

8,853

Author by

Socrates

Updated on September 18, 2022

Comments

Socrates over 1 year
I often use grep to find files having a certain entry like this:
```
grep -R 'MyClassName'
```
The good thing is that it returns the files, their contents and marks the found string in red. The bad thing is that I also have huge files where the entire text is written in one big single line. Now grep outputs too much when finding text within those big files. Is there a way to limit the output to for instance 5 words to the left and to the right? Or maybe limit the output to 30 letters to the left and to the right?
- Rinzwind about 6 years
  
  Pipe your results thru cut
- Sergiy Kolodyazhnyy about 6 years
  
  So, let's say the pattern you're looking for is at position 50, but you said you only want 30 letters.What do you want to do then ? Ignore that line or also include it into output but trim it ? What exactly do you want to limit - the search or the lines themselves ?
- Socrates about 6 years
  
  @Rinzwind I don't quite understand what you want to achieve with cut, as it only splits by delimiter or by count of characters. Though when I find a line with MyClassName it may be anywhere in the line and not always at the same position. Furthermore, there may be a variation of characters in the front and the back of it, which breaks the possibility to split by delimiter.
- Socrates about 6 years
  
  @SergiyKolodyazhnyy When a positive line with MyClassName has been found, I want to get as a result the file name and the x characters to the left and to the right. x is any number I provide, for instance 30. The rest of the file contents shall be ignored. This is to get a context to the matching files and limit the overload.
- Rinzwind about 6 years
  
  From the manual: "The -f switch of the cut command is the n-TH element separated by your delimiter".
- Socrates about 6 years
  
  @Rinzwind As far as I understand the -f switch of cut splits using a delimiter, like a space or a tab or any given one character. For me, needing the output for a little context of that has been found within a file, this doesn't provide any useful information. I can only output the word I'm searching or maybe a word before it or after it. But that only works if a delimiter could be found and used. I though don't know upfront what type of characters are infront or after the searched word.
- Rinzwind about 6 years
  
  Or a custom delimiter.
- Socrates about 6 years
  
  @Rinzwind What type of custom delimiter would you suggest with cut if there are three files with the following input: oiadfaosuoianavMyClassNameionaernaldfajd and /(/&%%§%/(§(/MyClassName&((/$/$/(§/$& and public class MyClassName { public static void main(String[] args) { } }?
- G-Man Says 'Reinstate Monica' about 6 years
  
  Very similar to How to make grep output fit screen's width of characters on U&L.
Socrates about 6 years

Ok, it works. Seems Regex is a valid approach, so thanks for that. The processing time is quite big though. Without Regex as in my above post it takes 4.912s and with Regex as in your post it takes 3m39.312s.
muru about 6 years

Yes, the linked post also says as much.
Socrates about 6 years

While it is true that the result is found much faster, there is missing info. The file path is shown, the line number is shown, but the text output is only my initial search MyClassName. Hence, the context is missing.
Socrates about 6 years

grep -RnHo "MyClassName" and grep -Rno "MyClassName" have the same output.
Robert Riedl about 6 years

@Socrates output is not the same without H in the same directory
Melebius about 6 years

The -o flag might be interesting if the regex had some variable part. For a fixed string, it’s useless to print it each time. OP is most likely interested in the near context.
Socrates about 6 years

@RobertRiedl At least in my local recursive search putting both outputs side by side, there is no visual difference. According to man grep, if there is more than one file, this option is set by default: "-H, --with-filename Print the file name for each match. This is the default when there is more than one file to search."
muru about 6 years

@Socrates see if the awk method I added above performs better
Melebius about 6 years

The fold method can be used only if you are sure that the searched string does not appear at the border, otherwise it would get hidden by grep.
Robert Riedl about 6 years

@Socrates, true - context is missing, but I thought that was the point ? Limit the output ? You can add context again by adding the lines before (-B 1) or after (-A 1). Sorry that I could not be of more help.
Socrates about 6 years

@muru Thanks for your suggestion with gawk. Unfortunately, the suggested command with find outputs random stuff and no file names, when executed on my system. Furthermore, I'm not fluent enough in awk to properly analyse the command. Currently, Regex in combination with grep solves the matter maybe not fast, but reliable. Again, thanks a lot.
Socrates about 6 years

@RobertRiedl Both -A and -B doesn't do the trick. Out of man grep for -A and -B: "With the -o or --only-matching option, this has no effect and a warning is given." Anyway, thank you for your input. Greatly appreciated.
muru about 6 years

@Socrates I think I managed to fix the awk command. My mental model was wrong about which line's RT and prefix, etc. were to be used.
Socrates about 6 years

@muru Ok, that fixed it. Much faster than with Regex and grep. A bit more verbose as well. Thanks for your help. Greatly appreciated.