How do I extract all the external links of a web page?

7,322
lynx -dump 'http://www.youtube.com/playlist?list=PLAA9A2EFA0E3A2039&feature=plcp' | awk '/http/{print $2}' | grep watch > links.txt

works. You need to escape the & in the link.

In your original line, the unescaped & will throw Lynx to the background, leaving empty input for links.txt. The background process will still write its output to the terminal you are in, but as you noticed, it will not do the > redirect (ambiguity: which process should write to the file?).

Addendum: I'm assuming a typo in your original command: the beginning and ending ' should not be present. Otherwise you'll get other error messages trying to execute a non-existing command. Removing those gives the behavior you describe.

Share:
7,322

Related videos on Youtube

whoever
Author by

whoever

Updated on September 18, 2022

Comments

  • whoever
    whoever over 1 year

    How do I extract all the external links of a web page and save them to a file?

    If there is any command line tools that would be great.

    It was quite the same question here, and the answer worked gracefully for the google.com, but for some reason it doesn't work with e.g. youtube. I'll explain: let's take for example this page. If I try to run

    lynx -dump http://www.youtube.com/playlist?list=PLAA9A2EFA0E3A2039&feature=plcp | awk '/http/{print $2}' | grep watch > links.txt
    

    then it, unlike using it on google.com firstly executes lynx's dump, followed by giving control to awk ( for some reason with empty input ), and finally writes nothing to the file links.txt. Only after that it displays non-filtered dump of lynx, without a possibility to transfer it elsewhere.

    Thank you in advance!

    • whoever
      whoever about 12 years
      Somewhere I saw the mentioning of the 'dog' command, which can do the same thing, but failed to find it elsewhere.
  • whoever
    whoever about 12 years
    Thanks so much! Hate myself for being so newbie. But, all in all 2 weeks of using Linux is not the time, yep? Thanks once again.
  • Daniel Andersson
    Daniel Andersson about 12 years
    @user1212010: This site relies on the questioner to mark the answer as correct if he/she feels it solved the problem. Checking it as such is the best way to say "Thanks" on SU :-) .