Need to extract a substring from a file path string including the delimiter
Solution 1
You could use sed
like below:
sed 's/\(\.jar\).*/\1/' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
Or through awk
command:
awk -F'\\.jar' '{print $1".jar"}' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
The output is:
test1/test2/Test.jar
Solution 2
Besides sed
, you also have the option of using grep
for this, with the PCRE regex ^.*?\.jar
:
grep -oP '^.*?\.jar' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
This prints only the match (-o
), uses PCRE (-P
), and matches text that:
- starts at the beginning of the line (
^
), and - contains any character (
.
), any number of times but matched lazily (*?
), - followed by a literal
.
character (\.
) andjar
(jar
)
Using the lazy quantifier *?
instead of the usual greedy quantifier *
causes grep
to match the fewest characters possible.
- Without it (and with the greedy quantifier instead),
grep
would match as many characters as possible so long as the match ended in.jar
, which would fail to stop after the first.jar
in cases where there is more than one. - The
-P
flag is required because, of the regex dialectsgrep
supports on Ubuntu, PCRE is the one that supports laziness. (This dialect is very similar to the regex dialect in Perl.)
Solution 3
Since you mention shell scripting I present a simple, purely shell based solution:
s='test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class'
echo "${s%%.jar*}.jar"
The parameter expansion %%
removes the longest suffix that matches the subsequent glob pattern .jar*
(as opposed to %
which matches the shortest suffix).
Solution 4
Since this question is tagged bash
, here's a bash
script with C-style loop and ${variable:beginning:offset}
parameter expansion to extract individual characters
#!/usr/bin/env bash
substring=""
for ((i=0;i<=${#1};i++))
do
substring="$substring""${1:$i:1}"
if [[ "$substring" == *.jar ]]
then
echo "$substring"
substring=""
fi
done
This works like so in action:
$ ./parse_string.sh test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class
test1/test2/Test.jar
/Test2.jar
If we wanted to extract only the first occurrence, add break
on line after substring=""
inside if
statement
Solution 5
In python
:
python3 -c "print('blub/blab/Test.jar/blieb'.split('.jar')[0]+'.jar')"
> blub/blab/Test.jar
or:
python3 -c "s='blub/blab/Test.jar/blieb';print(s[:s.find('.jar')+4])"
> blub/blab/Test.jar
Related videos on Youtube
Soumali Chatterjee
Updated on September 18, 2022Comments
-
Soumali Chatterjee over 1 year
While executing a shell script, an input string is similar to this:
test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class
How can I extract:
test1/test2/Test.jar
[i.e. substring till first occurrence of '.jar' delimiter, inclusive], in shell scriptHow can I do this? I would not like to use cut and then append '.jar' at the end.
Thanks
-
Sergiy Kolodyazhnyy almost 7 yearsSecond version is better ;)
-
terdon almost 7 yearsWhy use a C-loop? Why not just
${str//.jar*/.jar}
? -
steeldriver almost 7 years@DavidFoerster pls post that as an answer - IMHO this is by far preferable to all the sed/awk/grep/perl solutions suggested so far
-
Sergiy Kolodyazhnyy almost 7 years@terdon because iterating over characters of the string is the first idea to which my mind gravitated for some reason; no specific reason.
-
Sergiy Kolodyazhnyy almost 7 years@DavidFoerster I agree with steeldriver. You might want to post that as answer.
-
Eliah Kagan almost 7 years
%%
is standard. The Parameter Expansion sections of IEEE 1003.1-2008, IEEE 1003.1, and SUSv2 all cover it as "Remove Largest Suffix Pattern." Although not all Bourne-style shells are standards-conforming, I believe%%
is as portable as most of the other shell features we typically say are portable. -
David Foerster almost 7 years@EliahKagan: Thanks! I removed those parts of the question accordingly.