How do I grep for all words that are less than 4 characters?

14,679

Solution 1

You can just do:

egrep -x '.{1,3}' myfile

This will also skip blank lines, which are technically not words. Unfortunately, the above reg-ex will count apostrophes in contractions as letters as well as hyphens in hyphenated compound words. Hyphenated compound words are not a problem at such a low letter count, but I am not sure whether or not you want to count apostrophes in contractions, which are possible (e.g., I'm). You can try to use a reg-ex such as:

egrep -x '\w{1,3}' myfile

..., but this will only match upper/lower case letters and not match contractions or hyphenated compound words at all.

Solution 2

Like this: grep -v "^...." my_file

Solution 3

Try this regular expression:

grep -E '^.{1,3}$' your_dictionary
Share:
14,679
TIMEX
Author by

TIMEX

Updated on July 11, 2022

Comments

  • TIMEX
    TIMEX almost 2 years

    I have a dictionary with words separated by line breaks.

  • tchrist
    tchrist about 13 years
    Actually, it's worse than that: \w is messed up in GNU grep because a pattern like ^\w fails on strings like like "β-oxidation" and "γ-aminobutyric". I would run perl -CSD -ne 'print if /^\W*(\w\W*){1,3}$/', because that way it handles contractions and hyphenated words but doesn’t count the non-word characters towards it limit of 3. If you care about actual letters, you can use \pL and \PL instead of \w and \W, which match more broadly than that or even \p{Alphabetic}, per UTS#18’s requirements.
  • Paul Tomblin
    Paul Tomblin over 9 years
    @cbmanica, no, you are incorrect. "grep -v" finds all lines that don't match, and I'm matching any line with 5 or more characters. In other words, it returns any line with 4 or fewer characters.
  • cbmanica
    cbmanica over 9 years
    Given that OP wanted to find words that are "less than 4 characters", I'm afraid you've confirmed my assertion that your answer is incorrect.
  • THESorcerer
    THESorcerer about 9 years
    yes, is less, not less or equal, anyway, is a good idea and got the point