Using sed to add a URL to the start of every line

5,092

Solution 1

Use a different separator that doesn't contain any of the characters in the variable.

For example,

sed "s|^|$URL|"

(If you use / as the separator and the pattern or replacement also contain /, then you'd need to escape those.)

Solution 2

What you do until now can be replaced by one awk call:

wget ... | awk -F\" '$6 ~ "gz$" { lastline=thisline; thisline=$6;}; '\
  'END {print lastline; print thisline;}'

And, of course, awk can add the URL, too:

awk -F\" -v baseurl="http://ftp.gnu.org/gnu/wget/" \
  '$6 ~ "gz" { lastline=thisline; thisline=$6;}; '\
  'END {print baseurl lastline; print baseurl thisline;}'

Solution 3

You can use wget's --base option here:

wget -qO- http://ftp.gnu.org/gnu/wget/ |
  cut -d\" -sf6 |
  grep '\.tar\.gz' |
  tail -n2 |
  wget -i - --base=http://ftp.gnu.org/gnu/wget/

Solution 4

You can also do the whole thing directly like so:

wget -qO- http://ftp.gnu.org/gnu/wget/ | grep tar.gz | cut -d\" -f6 | 
 tail -n2 | xargs -I{} wget http://ftp.gnu.org/gnu/wget/{}

This passes the output of the 1st wget to xargs which replaces the string {} with each of the results of the piped command.

And you can skip some parsing steps with some trickery:

wget -qO- http://ftp.gnu.org/gnu/wget/ | tac | grep -Pom 2 'href="\K(.+?.tar.gz)' | 
xargs -I{} wget http://ftp.gnu.org/gnu/wget/{}

Here, we are using PCREs (-P) with grep and -o so it only prints the matched portion of the line and -m 2 to only print the 2 first matches. The tac call reverses the input so that the first 2 matches are actually the last (tac reverses its input, prints the last line as first, penultimate as second etc.).

The \K in the regular expression tells grep to ignore whatever came before the \K so that it is not printed when using -o.


Another approach, closer to what you had in mind, would be to read the target files in a loop:

wget -qO- http://ftp.gnu.org/gnu/wget/ |tac | 
 grep -Pom 2 'href="\K(.+?.tar.gz)' | 
    while read target; do 
        wget http://ftp.gnu.org/gnu/wget/"$target"; 
    done 
Share:
5,092

Related videos on Youtube

misteraidan
Author by

misteraidan

"The lyf so short, the craft so long to lerne." - Chaucer Blog = The Standard Output Code = Github

Updated on September 18, 2022

Comments

  • misteraidan
    misteraidan over 1 year

    I have the URL (see below) of a certain web page that lists many different versions of a software package.

    URL=http://ftp.gnu.org/gnu/wget/
    

    The following one-liner gets me the latest version tar ball and its signature file out of the HTML.

    wget -qO- http://ftp.gnu.org/gnu/wget/ | grep tar | cut -d\" -f6 | tail -n4 | grep gz
    

    Probably not the shortest most efficient one liner, but hey, I'm learning and I'm open for feedback. The result of the above is this:

    wget-1.15.tar.gz
    wget-1.15.tar.gz.sig
    

    Now, the next logical step (to me at least), is to pipe the output above into sed and append the $URL to the front of each line so that the output looks like:

    http://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz
    http://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz.sig
    

    And then I want to pipe that right back into wget to download the files.

    The question is this: How do I append the value of the bash variable $URL to the front of each line of output using sed? I tried the following:

    sed "s/^/$URL/"
    

    But that only gives me the error:

    sed: -e expression #1, char 11: unknown option to `s'
    

    I also know that the basic concept is good, because when I use the following, I get good results...

    VAR="Gorauskas, "
    echo "Jonas" | sed "s/^/$VAR/"
    

    So, my guess is that I need to somehow escape all of the / character in the $URL variable... Am I on the right track?

  • misteraidan
    misteraidan about 10 years
    It works! So easy, and yet so hard... :)
  • misteraidan
    misteraidan about 10 years
    Very, very cool indeed!
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    Another character to watch in URLs (in the general case, not here) is & as it's special to sed on the right hand side of a substitution.
  • Stephen Kitt
    Stephen Kitt over 2 years
    Thank your for confirming that devnull’s answer works. But please don’t add a new answer to do so; once you have sufficient reputation, you will be able to vote up questions and answers that you found helpful.