Need to extract a substring from a file path string including the delimiter

5,720

Solution 1

You could use sed like below:

sed 's/\(\.jar\).*/\1/' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class" 

Or through awk command:

awk -F'\\.jar' '{print $1".jar"}' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"

The output is:

test1/test2/Test.jar

Solution 2

Besides sed, you also have the option of using grep for this, with the PCRE regex ^.*?\.jar:

grep -oP '^.*?\.jar' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"

This prints only the match (-o), uses PCRE (-P), and matches text that:

  • starts at the beginning of the line (^), and
  • contains any character (.), any number of times but matched lazily (*?),
  • followed by a literal . character (\.) and jar (jar)

Using the lazy quantifier *? instead of the usual greedy quantifier * causes grep to match the fewest characters possible.

  • Without it (and with the greedy quantifier instead), grep would match as many characters as possible so long as the match ended in .jar, which would fail to stop after the first .jar in cases where there is more than one.
  • The -P flag is required because, of the regex dialects grep supports on Ubuntu, PCRE is the one that supports laziness. (This dialect is very similar to the regex dialect in Perl.)

Solution 3

Since you mention shell scripting I present a simple, purely shell based solution:

s='test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class'
echo "${s%%.jar*}.jar"

The parameter expansion %% removes the longest suffix that matches the subsequent glob pattern .jar* (as opposed to % which matches the shortest suffix).

Solution 4

Since this question is tagged bash, here's a bash script with C-style loop and ${variable:beginning:offset} parameter expansion to extract individual characters

#!/usr/bin/env bash

substring=""
for ((i=0;i<=${#1};i++))
do
    substring="$substring""${1:$i:1}"
    if [[ "$substring" == *.jar ]]
    then
        echo "$substring"
        substring=""
    fi
done

This works like so in action:

$ ./parse_string.sh test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class                                                                              
test1/test2/Test.jar
/Test2.jar

If we wanted to extract only the first occurrence, add break on line after substring="" inside if statement

Solution 5

In python:

python3 -c "print('blub/blab/Test.jar/blieb'.split('.jar')[0]+'.jar')"

> blub/blab/Test.jar

or:

python3 -c "s='blub/blab/Test.jar/blieb';print(s[:s.find('.jar')+4])"

> blub/blab/Test.jar
Share:
5,720

Related videos on Youtube

Soumali Chatterjee
Author by

Soumali Chatterjee

Updated on September 18, 2022

Comments

  • Soumali Chatterjee
    Soumali Chatterjee over 1 year

    While executing a shell script, an input string is similar to this:

    test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class
    

    How can I extract: test1/test2/Test.jar [i.e. substring till first occurrence of '.jar' delimiter, inclusive], in shell script

    How can I do this? I would not like to use cut and then append '.jar' at the end.

    Thanks

  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    Second version is better ;)
  • terdon
    terdon almost 7 years
    Why use a C-loop? Why not just ${str//.jar*/.jar}?
  • steeldriver
    steeldriver almost 7 years
    @DavidFoerster pls post that as an answer - IMHO this is by far preferable to all the sed/awk/grep/perl solutions suggested so far
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    @terdon because iterating over characters of the string is the first idea to which my mind gravitated for some reason; no specific reason.
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy almost 7 years
    @DavidFoerster I agree with steeldriver. You might want to post that as answer.
  • Eliah Kagan
    Eliah Kagan almost 7 years
    %% is standard. The Parameter Expansion sections of IEEE 1003.1-2008, IEEE 1003.1, and SUSv2 all cover it as "Remove Largest Suffix Pattern." Although not all Bourne-style shells are standards-conforming, I believe %% is as portable as most of the other shell features we typically say are portable.
  • David Foerster
    David Foerster almost 7 years
    @EliahKagan: Thanks! I removed those parts of the question accordingly.