Why does adding a colon break this grep pattern?

23,323

Solution 1

Looks like "blacklists/redirector/domains" is actually a filename, not part of the file's content. grep ':youtube.com' works just fine:

% cat test.txt
./blacklists/movies/domains:youtube.com
./blacklists/movies/domains:youtube.com.br
blacklists/redirector/domains:needyoutube.com
lacklists/redirector/domains:openyoutube.com
blacklists/redirector/domains:proxy-youtube.com
blacklists/redirector/domains:proxytoyoutube.com
blacklists/redirector/domains:streamyoutube.com
blacklists/redirector/domains:unblockyoutube.com
% grep ':youtube.com' test.txt
./blacklists/movies/domains:youtube.com
./blacklists/movies/domains:youtube.com.br

If you want to recursively find lines that starts with "youtube.com" use grep -R '^youtube\.com' path/to/dir

Solution 2

Grep is doing all right. That file does not contain any lines with ":youtube.com".

If you want to match all that lines with : you could use

grep ":.*youtube\.com"

UPD:

As you've update your question, I need try to answering second part.

From the list above (youtube.com, youtube.com.br), I only should get youtube.com, but I don't get anything.

Your grep ':youtube.com' actually did what you need. And if it's a paths -R option helps you.

Solution 3

As others have pointed out, the colon characters you are seeing are not in the files being searched by grep, they are in the output of grep. When grep finds a matching line in a file, it displays something like: filename:line

The problem you are having is you want to match files that contain youtube.com but NOT proxyyoutube.com, right?

In your case, it looks like the string you are looking for is on the beginning of a line, so you can do something like:

grep * "^youtube.com"

The up-arrow character will match only at the beginning of a line, so that way you can avoid matching on `extrastuffhereyoutube.com"

Share:
23,323

Related videos on Youtube

user1247806
Author by

user1247806

Updated on September 18, 2022

Comments

  • user1247806
    user1247806 almost 2 years

    I executed a search with grep, but it doesn't work like I expected it to. I have the following lines in a file:

    blacklists/redirector/domains:needyoutube.com
    lacklists/redirector/domains:openyoutube.com
    blacklists/redirector/domains:proxy-youtube.com
    blacklists/redirector/domains:proxytoyoutube.com
    blacklists/redirector/domains:streamyoutube.com
    blacklists/redirector/domains:unblockyoutube.com
    

    When I run:

    grep ':youtube.com'
    

    I get no results. The following works:

    grep 'youtube.com'
    

    How can I escape the colon (:)? Backslash (grep '\:youtube.com') doesn't work. I use RHEL 5, grep (GNU grep) 2.5.1.

    Update: I forgot the entries I wanted to grep, these exist also:

    ./blacklists/movies/domains:youtube.com
    ./blacklists/movies/domains:youtube.com.br
    

    I want to just get the fields that contain the exact domain name. So I want to get the blacklists linked to youtube.com, so I use ":youtube.com".

    From the list above (youtube.com, youtube.com.br), I only should get youtube.com, but I don't get anything.

    I wasn't clear enough, sorry.

  • user1247806
    user1247806 over 12 years
    Hi, I work with the blacklist database on shallalist.de. I had: find . -type f ( ! -name "*.db" ) | xargs grep ':youtube.com' Here the grep doesn't work, but as you stated, grep ':youtube.com' test.txt works fine! To get all blacklists where a domain exists, I now use: grep -R '^youtube\.com$' blacklists/*/domains Thanks!