Grep exact number of digits and some other characters

10,322

Solution 1

grep -vxE '([0-9]{5}[,-])*[0-9]{5}'

Would report the incorrect lines.

Or if you also want to forbid 12345-12345-12345:

num='[0-9]{5}'
num_or_range="$num(-$num)?"
grep -vxE "($num_or_range,)*$num_or_range"

Solution 2

For a good grep solution, see Stéphane's answer. As an alternative, here's a Perl one:

perl -ne 'print if grep{$_!~/^\d{5}$/} split(/[,-]/); ' file 

That will split each input line on , or - and then will look for members of the split array that don't consist of exactly 5 numbers. If any are found, the line is printed.

Solution 3

You don't need cat. Does this do what you want:

 $ grep -v -E '^([0-9]{5}(,|-))+' <FILE>

For example, if FILE had the following contents:

12345,23456,34567-45678,12345-23456,34567
1,2
12345*23456,34567-45678,12345-23456,34567
123456
1234*23456,34567-45678,12345-23456,34567

result would be:

$ grep -v -E '^([0-9]{5}(,|-))+' 5d
1,2
12345*23456,34567-45678,12345-23456,34567
123456
1234-23456,34567-45678,12345-23456,34567
Share:
10,322

Related videos on Youtube

magor
Author by

magor

The script works.

Updated on September 18, 2022

Comments

  • magor
    magor over 1 year

    I'd like to parse a file containing 5 digit numbers separated by comma or dash, lines like :
    12345,23456,34567-45678,12345-23456,34567

    My goal is to find lines which have incorrect formatting eg. lines which contain numbers which are not composed of 5 digits being separated by other characters than comma or dash.

    I tried to egrep the file with :

    cat file.txt | egrep -v [-,]*[0-9]{5}[,-]*

    • but if I have a 6 digit number, it is matched, and the line is not displayed
    • and if I have a 4 digit number, it is not matched but other numbers from that same line are matched and the line is not displayed

    To specify the lines content :

    • a number must be of 5 digits
    • ranges are defined with dash, like 12345-12389
    • a line can contain anything from a single number to several numbers and ranges in any order

    Any suggestions please ?

    • Admin
      Admin about 8 years
      You might want to show some correct lines, and incorrect lines to give people more to go on (show as much variation as possible).
    • Admin
      Admin about 8 years
      added some more details. maybe grep+regex is not the best solution to parse his....
  • terdon
    terdon about 8 years
    This will fail if a line contains only 5 numbers.
  • magor
    magor about 8 years
    Thanks for your effort ! As @terdon mentioned this fails in some cases, but it's very close.
  • Arkadiusz Drabczyk
    Arkadiusz Drabczyk about 8 years
    @mazs: ok, I thought you always expected - or , to be there between numbers.