Using sed to add a URL to the start of every line
Solution 1
Use a different separator that doesn't contain any of the characters in the variable.
For example,
sed "s|^|$URL|"
(If you use /
as the separator and the pattern or replacement also contain /
, then you'd need to escape those.)
Solution 2
What you do until now can be replaced by one awk
call:
wget ... | awk -F\" '$6 ~ "gz$" { lastline=thisline; thisline=$6;}; '\
'END {print lastline; print thisline;}'
And, of course, awk can add the URL, too:
awk -F\" -v baseurl="http://ftp.gnu.org/gnu/wget/" \
'$6 ~ "gz" { lastline=thisline; thisline=$6;}; '\
'END {print baseurl lastline; print baseurl thisline;}'
Solution 3
You can use wget
's --base
option here:
wget -qO- http://ftp.gnu.org/gnu/wget/ |
cut -d\" -sf6 |
grep '\.tar\.gz' |
tail -n2 |
wget -i - --base=http://ftp.gnu.org/gnu/wget/
Solution 4
You can also do the whole thing directly like so:
wget -qO- http://ftp.gnu.org/gnu/wget/ | grep tar.gz | cut -d\" -f6 |
tail -n2 | xargs -I{} wget http://ftp.gnu.org/gnu/wget/{}
This passes the output of the 1st wget
to xargs
which replaces the string {}
with each of the results of the piped command.
And you can skip some parsing steps with some trickery:
wget -qO- http://ftp.gnu.org/gnu/wget/ | tac | grep -Pom 2 'href="\K(.+?.tar.gz)' |
xargs -I{} wget http://ftp.gnu.org/gnu/wget/{}
Here, we are using PCREs (-P
) with grep
and -o
so it only prints the matched portion of the line and -m 2
to only print the 2 first matches. The tac
call reverses the input so that the first 2 matches are actually the last (tac
reverses its input, prints the last line as first, penultimate as second etc.).
The \K
in the regular expression tells grep
to ignore whatever came before the \K
so that it is not printed when using -o
.
Another approach, closer to what you had in mind, would be to read the target files in a loop:
wget -qO- http://ftp.gnu.org/gnu/wget/ |tac |
grep -Pom 2 'href="\K(.+?.tar.gz)' |
while read target; do
wget http://ftp.gnu.org/gnu/wget/"$target";
done
Related videos on Youtube
misteraidan
"The lyf so short, the craft so long to lerne." - Chaucer Blog = The Standard Output Code = Github
Updated on September 18, 2022Comments
-
misteraidan over 1 year
I have the URL (see below) of a certain web page that lists many different versions of a software package.
URL=http://ftp.gnu.org/gnu/wget/
The following one-liner gets me the latest version tar ball and its signature file out of the HTML.
wget -qO- http://ftp.gnu.org/gnu/wget/ | grep tar | cut -d\" -f6 | tail -n4 | grep gz
Probably not the shortest most efficient one liner, but hey, I'm learning and I'm open for feedback. The result of the above is this:
wget-1.15.tar.gz wget-1.15.tar.gz.sig
Now, the next logical step (to me at least), is to pipe the output above into
sed
and append the$URL
to the front of each line so that the output looks like:http://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz http://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz.sig
And then I want to pipe that right back into
wget
to download the files.The question is this: How do I append the value of the bash variable
$URL
to the front of each line of output usingsed
? I tried the following:sed "s/^/$URL/"
But that only gives me the error:
sed: -e expression #1, char 11: unknown option to `s'
I also know that the basic concept is good, because when I use the following, I get good results...
VAR="Gorauskas, " echo "Jonas" | sed "s/^/$VAR/"
So, my guess is that I need to somehow escape all of the
/
character in the$URL
variable... Am I on the right track? -
misteraidan about 10 yearsIt works! So easy, and yet so hard... :)
-
misteraidan about 10 yearsVery, very cool indeed!
-
Stéphane Chazelas about 10 yearsAnother character to watch in URLs (in the general case, not here) is
&
as it's special tosed
on the right hand side of a substitution. -
Stephen Kitt over 2 yearsThank your for confirming that devnull’s answer works. But please don’t add a new answer to do so; once you have sufficient reputation, you will be able to vote up questions and answers that you found helpful.