grep equivalent of the kwrite regex [A-Z][A-Z]+

text-processing grep regular-expression kwrite

8,799

Solution 1

You're using the right syntax in your first example; the problem is + is only considered special when using "extended" regular expressions. From the man page of the GNU implementation of grep:

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, $, and $.

(\?, \+, and \| are non-standard GNU extensions though).

So, you either need to escape the + (assuming GNU grep or compatible):

$ grep "^[A-Z][A-Z]\+" filename

Use the standard \{1,\} equivalent of GNU's \+:

$ grep '^[A-Z][A-Z]\{1,\}' filename

or even here:

$ grep '^[A-Z]\{2,\}' filename

Or turn on extended regular expressions, by passing grep the -E flag or just running egrep (egrep is the command that introduced those extended regular expressions in the late 70s):

$ grep -E "^[A-Z][A-Z]+" filename
$ egrep "^[A-Z][A-Z]+" filename

In any case, all those would be functionally equivalent to:

$ grep '^[A-Z][A-Z]' filename

So you don't even need the + operator.

In your other example you tried:

$ grep "^[A-Z][A-Z]*" filename

* works in basic regular expressions, but it matches 0 or more times, not 1 or more. The solution in your answer works because it says "match a capital, then another capital, then 0 or more capitals". The method in the question says "match a capital, then 1 or more capitals", which is the same. You can also use {min,max} to specify exactly how many you want, and if you leave out max it allows any number (this also requires extended regular expressions):

$ egrep "^[A-Z]{2,}"

(as a history note, egrep didn't support {min,max} initially (and still doesn't in Solaris 11 /bin/egrep for instance). \{min,max\} support was added to grep before {min,max} was added to egrep (which in the case of egrep did break backward compatibility)).

Solution 2

You just need to add an extra [A-Z]. So, it's

me@ROOROO:~/$ grep "^[A-Z][A-Z][A-Z]*" filename

8,799

Matthew

Updated on September 18, 2022

Comments

Matthew over 1 year
So, it took me ages, but I finally learned to think in terms of regular expressions, thanks to using them in kwrite.

But I still don't know how to translate that knowledge to grep. I love my grep, when I know what I'm doing with it, but the manual has always given me a headache.

I'd like to match stuff like the following lines:
```
CAPITALSFOLLOWING anewline.
CAPI
TALSFOLL owing
ANEW line.
```
That is, lines that begin with two or more capital letters. But I can't figure out how.

In kwrite, I would match these lines using:
```
\n[A-Z][A-Z]+
```
But grep... hmm. I have a feeling like it's something like:
```
me@ROOROO:~/$ grep "^[A-Z]something" filename
```
but
```
me@ROOROO:~/$ grep "^[A-Z][A-Z]+" filename
```
doesn't work (returns an empty file). A google search for the term 'grep match one or more occurrence' lead me to believe that
```
me@ROOROO:~/$ grep "^[A-Z][A-Z]*" filename
```
was the right syntax. But, alas, that doesn't do the trick.
- Gilles 'SO- stop being evil' about 12 years
  
  In the old days, each tool had its own regexp syntax. By default, grep uses its traditional syntax; use grep -E to have a more habitual syntax where a backslash followed by a non-alphanumeric character is never special.