Grep all lines with exactly one instance of a specific character

5,732

Solution 1

try

grep '^[^#]*#[^#]*$' file

where

^      ; begin of line
[^#]*  ; any number of char ≠ #
#      ; #
[^#]*  ; any number of char ≠ #
$      ; end of line

as sugested, you can grep on the whole line, with

grep -x '[^#]*#[^#]*'

with

  • same pattern without begin of line/end of line anchor.
  • -x to grep whole line, see man grep
-x, --line-regexp
   Select  only  those  matches  that  exactly  match  the  whole line.  For a regular
   expression pattern, this is like parenthesizing the pattern and then surrounding it
   with ^ and $.

Solution 2

Using awk:

awk -F'#' 'NF==2' infile

based on # field separator, if number of fields in a line was exactly two fields then will print out. note that for example #x or x# or even # are considered two fields so.

Solution 3

With two calls to grep: pick any line that has at least one #, then remove those that have at least two:

grep '#' filename | grep -v '#.*#'

Solution 4

Using GNU grep:

$ grep -P '^(?!(?:.*#){2}).*#' infile
xxxxxxxxx#aaa
xxxxxxxxxxxxxxxx#
xxx#x
$

The -P option means use PCRE (Perl Compatible Regular Expressions) regex. See https://www.pcre.org. PCRE are extensions to ERE (Extended Regular Expressions), originally introduced into Perl, which were later adopted by many commands, utilities, applications and programming languages.

If GNU grep is not available on your platform, you can install pcregrep which is part of the pcre-tools package that is available on many platforms.

The generalized form of this particular PCRE regex is:

^(?!(?:.*PATTERN){2}).*PATTERN

where PATTERN stands for the pattern that you want to occur once and only once in the grepped string. In our case the pattern is #.

  • ^ - start of the string
  • (?!(?:.*PATTERN){2}) - a negative lookahead that fails the match that is immediately to the right of the current location, i.e. the start of string, if there are two ({2}) consecutive occurrences of:
    • .* - 0 or more characters
    • PATTERN - the pattern
  • .* - 0 or more characters
  • PATTERN - the pattern

Solution 5

With awk we can use the gsub function in the condition block to select our lines :

$ awk 'gsub(/#/, "#") == 1' file

$ awk '/#/ && ! /#.*#/' file 

$ sed -ne 's/#/&/2;t' -e '//p' file
  • Lines with atleast 2 # will not be printed due to the t command and -n option given to sed.
  • That leaves us with lines either with exactly one # or none. Print he former with //

With perl we can count the number of # chars in a scalar context to detect our lines :

$ perl -ne 'print if tr/#/#/ == 1'  file
Share:
5,732
Torsten
Author by

Torsten

Updated on September 18, 2022

Comments

  • Torsten
    Torsten over 1 year

    I want to grep all lines with only one "#" in a line.

    Example:

    xxx#aaa#iiiii
    xxxxxxxxx#aaa
    #xxx#bbb#111#yy
    xxxxxxxxxxxxxxxx#
    xxx#x
    #x#v#e#
    

    Should give this output

    xxxxxxxxx#aaa
    xxxxxxxxxxxxxxxx#
    xxx#x
    
  • Torsten
    Torsten almost 4 years
    Ahhhh... Million times thanks!!! I wasn't aware about the "p" option.
  • Арсений Черенков
    Арсений Черенков almost 4 years
    would you care to explain the regex ?
  • Jim L.
    Jim L. almost 4 years
    The second regexp can be simplified to '#.*#'.
  • Praveen Kumar BS
    Praveen Kumar BS almost 4 years
    Please let me know the reason for downvote
  • αғsнιη
    αғsнιη almost 4 years
    I'm not down-voter, but for awk which I know a bit how it works, you just overkill the awk 'gsub(/#/, "#") == 1' file command given answer by Rakesh Sharma above. I don't know python at all for that part.
  • Praveen Kumar BS
    Praveen Kumar BS almost 4 years
    Thanks for input and suggestion
  • Ed Morton
    Ed Morton almost 4 years
    and THAT is the main problem with -P - something that's trivial and obvious in a BRE or ERE becomes convoluted runes in a PCRE. The other 2 problems are that it's only available in GNU grep so it's not portable, and its still experimental and doesn''t play well with other grep options (per the GNU grep man page).
  • fpmurphy
    fpmurphy almost 4 years
    @EdMorton. Actually, the current grep manpage states that -P is "experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features." Other than -z option, the manpage does not state any other options that -P "doesn''t play well with". What other grep options are you aware of that -P is problematic with?
  • ilkkachu
    ilkkachu almost 4 years
    @EdMorton, and some other things are impossible in BRE and ERE, but possible in PCRE, or at least made simpler in it. It's not a problem with -P or PCRE, but with how to use them.
  • Ed Morton
    Ed Morton almost 4 years
    @fpmurphy I know the man page changed recently from saying This is highly experimental (e.g. linux.die.net/man/1/grep) to experimental when combined with the -z but I wouldn't consider using -P as long as the man page refers to it as experimental in any context (why should using NUL instead of \n as the terminating char make any difference in reasonably written software and how do they know that's the only thing impacted), and so I'm sorry but I can't provide insight on what the unimplemented features might be that it will apparently warn you about or other issues.
  • Ed Morton
    Ed Morton almost 4 years
    @ilkkachu right but if a task is too complicated to handle in a BRE or ERE then that doesn't mean a PCRE is the best way to handle it - Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems..
  • done
    done almost 4 years
    @EdMorton How do you match a palindrome in BRE or ERE ? How do you match nested parenthesis in BRE or ERE ? Too complex? Let's try a simpler one: How do you ensure that there is no foo following a match without also matching the foo ?
  • Ed Morton
    Ed Morton almost 4 years
    @Isaac I'm not saying it's possible to, nor that you need to, perform all tasks using a single regular expression. What I was pointing out above is that something trivial and obvious became convoluted enough when written as a PCRE that it inspired a request for explanation that's been upvoted by 7 people so far, unlike any of the other solutions posted. Feel free to ask a question if you have one, make sure to include a MCVE with concise, testable sample input and expected output.
  • done
    done almost 4 years
    But you do say: if a task is too complicated to handle in a BRE or ERE then that doesn't mean a PCRE is the best way to handle it. And. in the three examples I provided, a PCRE is not only the best way but the only way to match with a regex. I suggest that you may benefit from being more open to alternative solutions. @EdMorton
  • Ed Morton
    Ed Morton almost 4 years
    @Isaac but that's assuming I'd want to tackle that problem with a regexp when in fact I'm more open to an alternative solution that can be implemented using standard UNIX tools, none of which support PCREs. If you'd like to see a solution that I'd use, again please do feel free to post a question.
  • ilkkachu
    ilkkachu almost 4 years
    @EdMorton, for a moment I thought that Atwood blog entry was about trying to parse HTML with regex, which is just stupid(*). But that seems to be actually a rather simple pattern, just with a long list of alternations. Would probably be better written as </?($foo)> (where $foo is some method of inserting the list of words joined with |) or something in that direction (I'm not going to test it now). That can be done with PCRE or ERE, it should be about the same length in either.
  • ilkkachu
    ilkkachu almost 4 years
    @EdMorton, but PCRE does have some useful things that aren't readily available in ERE, at least non-greedy matching (.*?) and word border matches come to mind (\b). Look-aheads are also sometimes useful, but here it's just an overcomplicated way of doing that. And that's not because it's PCRE, but because they chose an overcomplicated way in stead of the straightforward. (Sorry, fpmurphy.) Though of course ERE would allow much fewer possibilities for such non-straightforward solutions, so there's that.
  • ilkkachu
    ilkkachu almost 4 years
    (* since I mentioned parsing HTML, and Isaac also mentioned nested parenthesis and other stuff that's beyond actual regular languages, I just have to link this here: metacpan.org/pod/distribution/Regexp-Grammars/lib/Regexp/… . I'll leave it to everyone to decide if they want to use Perl REs, or e.g. yacc/bison the next time they need an actual parser.)
  • done
    done almost 4 years
    @fpmurphy Wouldn't it be simpler to use: grep -P '^[^#]*#(?!.*#)' infile. In words: match the first # provided that (the lookahead) no other # exists ?
  • fpmurphy
    fpmurphy almost 4 years
    @Isaac Yes. That is another approach that works well.
  • ilkkachu
    ilkkachu almost 4 years
    @Isaac, incidentally that's just about the same as the ERE in the accepted answer. This problem isn't much if a showcase for the usefulness of PCRE. :) PCRE might start to be more useful than ERE if the question was about a separator longer than one character. (On the other hand, if we had the the whole Perl language at hand, we could do something other than regex-optimizing too.)
  • Jeff Schaller
    Jeff Schaller almost 4 years
    Comments are not for extended discussion; this conversation has been moved to chat.