Limit grep output to short lines

8,853

Solution 1

grep itself only has options for context based on lines. An alternative is suggested by this SU post:

A workaround is to enable the option 'only-matching' and then to use RegExp's power to grep a bit more than your text:

grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath

Of course, if you use color highlighting, you can always grep again to only color the real match:

grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}"  ./filepath | grep "WHAT_I_M_SEARCHING"

As another alternative, I'd suggest folding the text and then grepping it, for example:

fold -sw 80 input.txt | grep ...

The -s option will make fold push words to the next line instead of breaking in between.

Or use some other way to split the input in lines based on the structure of your input. (The SU post, for example, dealt with JSON, so using jq etc. to pretty-print and grep ... or just using jq to do the filtering by itself ... would be better than either of the two alternatives given above.)


This GNU awk method might be faster:

gawk -v n=50 -v RS='MyClassName' '
  FNR > 1 { printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)}
  {p = substr($0, length - n); prt = RT}
' input.txt
  • Tell awk to split records on the pattern we're interested in (-v RS=...), and the number of characters in context (-v n=...)
  • Each record after the first record (FNR > 1) is one where awk found a match for the pattern.
  • So we print n trailing characters from the previous line (p) and n leading characters from the current line (substr($0, 0, n)), along with the matched text for the previous line (which is prt)
    • we set p and prt after printing, so the value we set is used by the next line
    • RT is a GNUism, that's why this is GNU awk-specific.

For recursive search, maybe:

find . -type f -exec gawk -v n=50 -v RS='MyClassName' 'FNR>1{printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)} {p = substr($0, length-n); prt = RT}' {} +

Solution 2

Using only-matching in combination with some other options(see below), might be very close to what you are seeking, without the processing overhead of regex mentioned in the other answer

grep -RnHo 'MyClassName'
  • n numeric output, show the line number of the match
  • H filename, show the filename at the start of the line of the match
  • o only matches, only show the matched string, not the whole line
Share:
8,853
Socrates
Author by

Socrates

Updated on September 18, 2022

Comments

  • Socrates
    Socrates over 1 year

    I often use grep to find files having a certain entry like this:

    grep -R 'MyClassName'
    

    The good thing is that it returns the files, their contents and marks the found string in red. The bad thing is that I also have huge files where the entire text is written in one big single line. Now grep outputs too much when finding text within those big files. Is there a way to limit the output to for instance 5 words to the left and to the right? Or maybe limit the output to 30 letters to the left and to the right?

    • Rinzwind
      Rinzwind about 6 years
      Pipe your results thru cut
    • Sergiy Kolodyazhnyy
      Sergiy Kolodyazhnyy about 6 years
      So, let's say the pattern you're looking for is at position 50, but you said you only want 30 letters.What do you want to do then ? Ignore that line or also include it into output but trim it ? What exactly do you want to limit - the search or the lines themselves ?
    • Socrates
      Socrates about 6 years
      @Rinzwind I don't quite understand what you want to achieve with cut, as it only splits by delimiter or by count of characters. Though when I find a line with MyClassName it may be anywhere in the line and not always at the same position. Furthermore, there may be a variation of characters in the front and the back of it, which breaks the possibility to split by delimiter.
    • Socrates
      Socrates about 6 years
      @SergiyKolodyazhnyy When a positive line with MyClassName has been found, I want to get as a result the file name and the x characters to the left and to the right. x is any number I provide, for instance 30. The rest of the file contents shall be ignored. This is to get a context to the matching files and limit the overload.
    • Rinzwind
      Rinzwind about 6 years
      From the manual: "The -f switch of the cut command is the n-TH element separated by your delimiter".
    • Socrates
      Socrates about 6 years
      @Rinzwind As far as I understand the -f switch of cut splits using a delimiter, like a space or a tab or any given one character. For me, needing the output for a little context of that has been found within a file, this doesn't provide any useful information. I can only output the word I'm searching or maybe a word before it or after it. But that only works if a delimiter could be found and used. I though don't know upfront what type of characters are infront or after the searched word.
    • Rinzwind
      Rinzwind about 6 years
      Or a custom delimiter.
    • Socrates
      Socrates about 6 years
      @Rinzwind What type of custom delimiter would you suggest with cut if there are three files with the following input: oiadfaosuoianavMyClassNameionaernaldfajd and /(/&%%§%/(§(/MyClassName&((/$/$/(§/$& and public class MyClassName { public static void main(String[] args) { } }?
    • G-Man Says 'Reinstate Monica'
      G-Man Says 'Reinstate Monica' about 6 years
  • Socrates
    Socrates about 6 years
    Ok, it works. Seems Regex is a valid approach, so thanks for that. The processing time is quite big though. Without Regex as in my above post it takes 4.912s and with Regex as in your post it takes 3m39.312s.
  • muru
    muru about 6 years
    Yes, the linked post also says as much.
  • Socrates
    Socrates about 6 years
    While it is true that the result is found much faster, there is missing info. The file path is shown, the line number is shown, but the text output is only my initial search MyClassName. Hence, the context is missing.
  • Socrates
    Socrates about 6 years
    grep -RnHo "MyClassName" and grep -Rno "MyClassName" have the same output.
  • Robert Riedl
    Robert Riedl about 6 years
    @Socrates output is not the same without H in the same directory
  • Melebius
    Melebius about 6 years
    The -o flag might be interesting if the regex had some variable part. For a fixed string, it’s useless to print it each time. OP is most likely interested in the near context.
  • Socrates
    Socrates about 6 years
    @RobertRiedl At least in my local recursive search putting both outputs side by side, there is no visual difference. According to man grep, if there is more than one file, this option is set by default: "-H, --with-filename Print the file name for each match. This is the default when there is more than one file to search."
  • muru
    muru about 6 years
    @Socrates see if the awk method I added above performs better
  • Melebius
    Melebius about 6 years
    The fold method can be used only if you are sure that the searched string does not appear at the border, otherwise it would get hidden by grep.
  • Robert Riedl
    Robert Riedl about 6 years
    @Socrates, true - context is missing, but I thought that was the point ? Limit the output ? You can add context again by adding the lines before (-B 1) or after (-A 1). Sorry that I could not be of more help.
  • Socrates
    Socrates about 6 years
    @muru Thanks for your suggestion with gawk. Unfortunately, the suggested command with find outputs random stuff and no file names, when executed on my system. Furthermore, I'm not fluent enough in awk to properly analyse the command. Currently, Regex in combination with grep solves the matter maybe not fast, but reliable. Again, thanks a lot.
  • Socrates
    Socrates about 6 years
    @RobertRiedl Both -A and -B doesn't do the trick. Out of man grep for -A and -B: "With the -o or --only-matching option, this has no effect and a warning is given." Anyway, thank you for your input. Greatly appreciated.
  • muru
    muru about 6 years
    @Socrates I think I managed to fix the awk command. My mental model was wrong about which line's RT and prefix, etc. were to be used.
  • Socrates
    Socrates about 6 years
    @muru Ok, that fixed it. Much faster than with Regex and grep. A bit more verbose as well. Thanks for your help. Greatly appreciated.