How to handle CRLF line endings in grep?

grep regular-expression

8,245

based on this page. try these solutions

https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu

curl -sI http://unix.stackexchange.com | head -4  | grep "200 OK$(printf '\r')" 

grep -IUlr $'\r'

8,245

Witiko

Updated on September 18, 2022

Comments

Witiko over 1 year
Suppose I have an arbitrary text input that contains CRLF line endings:
```
$ curl -sI http://unix.stackexchange.com | head -4
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Length: 80551
Content-Type: text/html; charset=utf-8

$ curl -sI http://unix.stackexchange.com | head -4 | hexdump -C
00000000  48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d  |HTTP/1.1 200 OK.|
00000010  0a 43 61 63 68 65 2d 43  6f 6e 74 72 6f 6c 3a 20  |.Cache-Control: |
00000020  70 75 62 6c 69 63 2c 20  6d 61 78 2d 61 67 65 3d  |public, max-age=|
00000030  36 30 0d 0a 43 6f 6e 74  65 6e 74 2d 4c 65 6e 67  |60..Content-Leng|
00000040  74 68 3a 20 38 30 39 30  32 0d 0a 43 6f 6e 74 65  |th: 80902..Conte|
00000050  6e 74 2d 54 79 70 65 3a  20 74 65 78 74 2f 68 74  |nt-Type: text/ht|
00000060  6d 6c 3b 20 63 68 61 72  73 65 74 3d 75 74 66 2d  |ml; charset=utf-|
00000070  38 0d 0a                                          |8..|
00000073
```
GNU grep 2.26 does not handle such input very well with respect to line endings:
```
$ curl -sI http://unix.stackexchange.com | head -4 | grep '200 OK$'
$ curl -sI http://unix.stackexchange.com | head -4 | grep '200 OK.$'
HTTP/1.1 200 OK
```
This is a little annoying. I can of course resolve this by including dos2unix into the pipeline:
```
$ curl -sI http://unix.stackexchange.com | head -4 | dos2unix | grep '200 OK$'
HTTP/1.1 200 OK
```
but this feels a little hamfisted (and not very portable).

~~The weird thing in general is that the grep(2) man page claims that the tool will strip any CRs in the input, unless the input has been detected as binary:~~

-U, --binary Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, grep guesses whether a file is text or binary as described for the --binary-files option. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows.

EDIT: As stated in the manpage, this behaviour is MS-DOS and MS-Windows specific.

Is it possible to make grep transparently handle CRLF (and CR) line endings without preprocessing the input? If not, is this something that should be patched, or is there a well-founded rationale?
- JdeBP over 7 years
  
  Modifying a pipeline by adding a filter partway through should not feel hamfisted. It's a Unix norm. dos2unix may not be portable, but tr and sed are and can do the same filtering. It's a perl one-liner, too.
- Witiko over 7 years
  
  Sure, but it still adds complexity to the command and this strikes me as a common enough problem to warrant direct support within grep.
- Angel Todorov over 7 years
  
  A sed equivalent of dos2unix is sed 's/\r$//'
- Witiko over 7 years
  
  … which is equivalent to using the pattern something\r\?$ instead of something$ directly within grep. Still, it is an annoyance and a level of detail I would expect grep to abstract away (if I ask nicely enough through some flag). Suppose you are grepping through a file that uses only \r to end lines (the way old macs did). Then it becomes more than an annoyance, since grep will not recognize these as line endings and buffer the entire file as a single line. Of course, sed 's/\r/\n/g' will fix this, but how would anyone think having to do this is a good idea baffles me.
- Admin almost 2 years
  
  plus these filter solutions aren't suitable when I want to grep in multiple files in one go, like grep "pattern$" *.txt
Witiko over 7 years

I can of course use $(printf '\r') – or $'\r' in bash – to insert a literal CR into the pattern. What I'm asking, however, is if there is a way for me to not have to do that. I'd like to match line endings transparently (i.e. regardless of whether they consist of a CR, LF, or CRLF).
Witiko about 5 years

As I indicated in the comment section of the original question, this goes deeper than playing with the pattern, since grep will only buffer lines terminated by \n, whereas \r does not terminate a line from the buffering standpoint.