How to make sed do non-greedy match?

19,519

Solution 1

You can't do non greedy regex in sed, but you can do something like this instead:

echo "XML-Xerces-2.7.0-0.tar.gz" | sed -e 's/^\(\([^-]\|-[^0-9]\)*\).*/\1/g'

Which will capture everything up until it finds a - followed by [0-9].

Solution 2

You actually don't sed when you're in bash:

shopt -s extglob
V='XML-Xerces-2.7.0-0.tar.gz'
echo "${V%%-+([0-9]).+([0-9])*}"
Share:
19,519
Red Cricket
Author by

Red Cricket

I LOVE stackoverflow.com ... I am a support of the perlmonks too.

Updated on June 04, 2022

Comments

  • Red Cricket
    Red Cricket almost 2 years

    I cannot seem to figure out how to come up with the correct regex for my bash command line. Here's what I am doing:

    echo "XML-Xerces-2.7.0-0.tar.gz" | sed -e's/^\(.*\)-[0-9].*/\1/g'
    

    This gives me the output of ...

    XML-Xerces-2.7.0
    

    ... but want I need is the output to be ...

    XML-Xerces
    

    ... I guess I could do this ...

     echo "XML-Xerces-2.7.0-0.tar.gz" | sed -e's/^\(.*\)-[0-9].*/\1/g' | sed -e's/^\(.*\)-[0-9].*/\1/g'
    

    ... but I would like to know how understand sed regex a little better.

    Update:

    I tried this ...

    echo "XML-Xerces-2.7.0-0.tar.gz" | sed -e's/^\([^-]*\)-[0-9].*/\1/g'
    

    ... as suggest but that outputs XML-Xerces-2.7.0-0.tar.gz

  • Red Cricket
    Red Cricket over 10 years
    Ah! that works! Thanks
  • Paul
    Paul over 10 years
    @RedCricket You're welcome
  • Red Cricket
    Red Cricket over 10 years
    Pretty slick! But I am not sure I understand how to read ${V%%-+([0-9]).+([0-9])*}. Could you explain that part?
  • konsolebox
    konsolebox over 10 years
    @RedCricket It's an extended glob. See here. The feature is not enabled by default and we enable it through shopt -s extglob. The expansion method of the variable deletes the match found in the end of the variable's value. Expansion methods are explained here. The pattern -+([0-9]).+([0-9])* matches -2.7.0-0.tar.gz of XML-Xerces-2.7.0-0.tar.gz and so that part is deleted. In regex it's actually like -[0-9]+\.[0-9]+.*$.