How to get last part of http link in Bash?
Solution 1
Using awk
for this would work, but it's kind of deer hunting with a howitzer. If you already have your URL bare, it's pretty simple to do what you want if you put it into a shell variable and use bash
's built-in parameter substitution:
$ myurl='http://www.example.com/long/path/to/example/file.ext'
$ echo ${myurl##*/}
file.ext
The way this works is by removing a prefix that greedily matches '*/', which is what the ##
operator does:
${haystack##needle} # removes any matching 'needle' from the
# beginning of the variable 'haystack'
Solution 2
basename
and dirname
work good for URLs too:
> url="http://www.test.com/abc/def/efg/file.jar"
> basename "$url"; basename -s .jar "$url"; dirname "$url"
file.jar
file
http://www.test.com/abc/def/efg
Solution 3
With awk
, you can use $NF
, to get the last field, regardless of number of fields:
awk -F / '{print $NF}'
If you store that string in shell variable, you can use:
a=http://www.test.com/abc/def/efg/file.jar
printf '%s\n' "${a##*/}"
Solution 4
Most of the posted answers are not robust on URLs that contain query strings or targets, such as, for example, the following:
https://example.com/this/is/a/path?query#target
Python has URL parsing in its standard library; it's easier to let it do it. E.g.,
from urllib import parse
import sys
path = parse.urlparse(sys.stdin.read().strip()).path
print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])
You can compact that into a single python3 -c
for use in a shell script:
echo 'https://example.com/this/is/a/path/componets?query#target' \
| python3 -c 'from urllib import parse; import sys; path = parse.urlparse(sys.stdin.read().strip()).path; print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])'
(You can also keep the script broken out, too, for readability. '
will let you put newlines in.)
Of course, now your shell script has a dependency on Python.
(I'm a little unsure about the if that tries to handle cases where the URL's path component is the root (/
); adjust/test if that matters to you.)
Solution 5
One method is to rev
the URL then cut the field and then rev
again. eg:
echo 'http://www.test.com/abc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev
Output:
file.jar
Example 2:
echo 'http://www.test.com/abc/cscsc/sccsc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev
Output:
file.jar
Related videos on Youtube
Marko
Updated on September 18, 2022Comments
-
Marko over 1 year
I have an http link :
http://www.test.com/abc/def/efg/file.jar
and I want to save the last part file.jar to variable, so the output string is "file.jar".
Condition: link can has different length e.g.:
http://www.test.com/abc/def/file.jar.
I tried it that way:
awk -F'/' '{print $7}'
, but problem is the length of URL, so I need a command which can be used for any URL length.
-
Questionmark over 7 yearsAny sort of explanation to go with that?
-
Pankaj Goyal over 7 yearsSure. Will that do?
-
Questionmark over 7 yearsThat is great :)
-
Tulains Córdova over 7 years+1 Brilliant, it works because an URL and a PATH and both URIs.
-
Stephen Kitt over 7 years@TulainsCórdova a path isn't a URI; this works because
basename
anddirname
split strings on /, and that happens to work with URLs too, at least as long as they don't have a local portion (not with URIs in general though). -
Tulains Córdova over 7 yearsIn the Wikipedia article about URIs, they give the following as valid examples of URI references:
/relative/URI/with/absolute/path/to/resource.txt
,relative/path/to/resource.txt
,../../../resource.txt
andresource.txt
en.wikipedia.org/wiki/… -
Doktor J over 7 yearsIf you want to strip query strings, you can first assign to an intermediate variable e.g.
file=${myurl##*/}
, then use greedy reverse-matching to back up to the?
(don't forget to escape it!), e.g.echo ${file%%\?*}
-
hvd over 7 years@TulainsCórdova Wikipedia is not wrong,
/relative/path
can be either a file system path or a relative URI. But which of those it is depends on the context. When it's used as a file system path, it's not a URI. When it's used as a URI, it's not a file system path. Saying it's a URI just because it happens to match the syntax is like saying each of the words in this comment is a URI as well.