How do I extract all the external links of a web page?
lynx -dump 'http://www.youtube.com/playlist?list=PLAA9A2EFA0E3A2039&feature=plcp' | awk '/http/{print $2}' | grep watch > links.txt
works. You need to escape the &
in the link.
In your original line, the unescaped &
will throw Lynx to the background, leaving empty input for links.txt
. The background process will still write its output to the terminal you are in, but as you noticed, it will not do the >
redirect (ambiguity: which process should write to the file?).
Addendum: I'm assuming a typo in your original command: the beginning and ending '
should not be present. Otherwise you'll get other error messages trying to execute a non-existing command. Removing those gives the behavior you describe.
Related videos on Youtube
whoever
Updated on September 18, 2022Comments
-
whoever over 1 year
How do I extract all the external links of a web page and save them to a file?
If there is any command line tools that would be great.
It was quite the same question here, and the answer worked gracefully for the google.com, but for some reason it doesn't work with e.g. youtube. I'll explain: let's take for example this page. If I try to run
lynx -dump http://www.youtube.com/playlist?list=PLAA9A2EFA0E3A2039&feature=plcp | awk '/http/{print $2}' | grep watch > links.txt
then it, unlike using it on google.com firstly executes lynx's dump, followed by giving control to awk ( for some reason with empty input ), and finally writes nothing to the file links.txt. Only after that it displays non-filtered dump of lynx, without a possibility to transfer it elsewhere.
Thank you in advance!
-
whoever about 12 yearsSomewhere I saw the mentioning of the 'dog' command, which can do the same thing, but failed to find it elsewhere.
-
-
whoever about 12 yearsThanks so much! Hate myself for being so newbie. But, all in all 2 weeks of using Linux is not the time, yep? Thanks once again.
-
Daniel Andersson about 12 years@user1212010: This site relies on the questioner to mark the answer as correct if he/she feels it solved the problem. Checking it as such is the best way to say "Thanks" on SU :-) .