How to get last part of http link in Bash?

30,504

Solution 1

Using awk for this would work, but it's kind of deer hunting with a howitzer. If you already have your URL bare, it's pretty simple to do what you want if you put it into a shell variable and use bash's built-in parameter substitution:

$ myurl='http://www.example.com/long/path/to/example/file.ext'
$ echo ${myurl##*/}
file.ext

The way this works is by removing a prefix that greedily matches '*/', which is what the ## operator does:

${haystack##needle} # removes any matching 'needle' from the
                    # beginning of the variable 'haystack'

Solution 2

basename and dirname work good for URLs too:

> url="http://www.test.com/abc/def/efg/file.jar"
> basename "$url"; basename -s .jar "$url"; dirname "$url"
file.jar
file
http://www.test.com/abc/def/efg

Solution 3

With awk, you can use $NF, to get the last field, regardless of number of fields:

awk -F / '{print $NF}'

If you store that string in shell variable, you can use:

a=http://www.test.com/abc/def/efg/file.jar
printf '%s\n' "${a##*/}"

Solution 4

Most of the posted answers are not robust on URLs that contain query strings or targets, such as, for example, the following:

https://example.com/this/is/a/path?query#target

Python has URL parsing in its standard library; it's easier to let it do it. E.g.,

from urllib import parse
import sys
path = parse.urlparse(sys.stdin.read().strip()).path
print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])

You can compact that into a single python3 -c for use in a shell script:

echo 'https://example.com/this/is/a/path/componets?query#target' \
    | python3 -c 'from urllib import parse; import sys; path = parse.urlparse(sys.stdin.read().strip()).path; print("/" if not path or path == "/" else path.rsplit("/", 1)[-1])'

(You can also keep the script broken out, too, for readability. ' will let you put newlines in.)

Of course, now your shell script has a dependency on Python.

(I'm a little unsure about the if that tries to handle cases where the URL's path component is the root (/); adjust/test if that matters to you.)

Solution 5

One method is to rev the URL then cut the field and then rev again. eg:

echo 'http://www.test.com/abc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev

Output:

file.jar 

Example 2:

echo 'http://www.test.com/abc/cscsc/sccsc/def/efg/file.jar ' | rev | cut -d '/' -f 1 | rev

Output:

file.jar
Share:
30,504

Related videos on Youtube

Marko
Author by

Marko

Updated on September 18, 2022

Comments

  • Marko
    Marko over 1 year

    I have an http link :

    http://www.test.com/abc/def/efg/file.jar 
    

    and I want to save the last part file.jar to variable, so the output string is "file.jar".

    Condition: link can has different length e.g.:

    http://www.test.com/abc/def/file.jar.
    

    I tried it that way:

    awk -F'/' '{print $7}'
    

    , but problem is the length of URL, so I need a command which can be used for any URL length.

  • Questionmark
    Questionmark over 7 years
    Any sort of explanation to go with that?
  • Pankaj Goyal
    Pankaj Goyal over 7 years
    Sure. Will that do?
  • Questionmark
    Questionmark over 7 years
    That is great :)
  • Tulains Córdova
    Tulains Córdova over 7 years
    +1 Brilliant, it works because an URL and a PATH and both URIs.
  • Stephen Kitt
    Stephen Kitt over 7 years
    @TulainsCórdova a path isn't a URI; this works because basename and dirname split strings on /, and that happens to work with URLs too, at least as long as they don't have a local portion (not with URIs in general though).
  • Tulains Córdova
    Tulains Córdova over 7 years
    In the Wikipedia article about URIs, they give the following as valid examples of URI references: /relative/URI/with/absolute/path/to/resource.txt, relative/path/to/resource.txt, ../../../resource.txt and resource.txt en.wikipedia.org/wiki/…
  • Doktor J
    Doktor J over 7 years
    If you want to strip query strings, you can first assign to an intermediate variable e.g. file=${myurl##*/}, then use greedy reverse-matching to back up to the ? (don't forget to escape it!), e.g. echo ${file%%\?*}
  • hvd
    hvd over 7 years
    @TulainsCórdova Wikipedia is not wrong, /relative/path can be either a file system path or a relative URI. But which of those it is depends on the context. When it's used as a file system path, it's not a URI. When it's used as a URI, it's not a file system path. Saying it's a URI just because it happens to match the syntax is like saying each of the words in this comment is a URI as well.