Can HTTrack download a website without the index page or a website that has some "isolated" pages?

10,595

You're right that such tools will only work based on links between the pages. If a page has no other pages pointing to it, it's "invisible" for HTTrack (and other "spider" tools). If you know the URLs to this "unlinked" pages, you add them manually.

However, if the webserver has "Directory Browsing" enabled, by pointing to a URL containing a directory and no page name, it will display a list of all files in the directory. But it's seldom activated for security reasons. Most of the times, if no page name is specified, the webserver will serve a default page (index.html, index.php, default.html, ...) instead of the directory content.

Share:
10,595
brilliant
Author by

brilliant

Updated on September 17, 2022

Comments

  • brilliant
    brilliant over 1 year

    Is it like HTTrack can only download websites that have an index page? And the index page should have all the links to all the other pages on the site, right? Or, at least, all the pages on the site must interconnected by links somehow, right? So, if there is at least one page that is not containing any link and is not linked to from any other page, then this kind of a page will not be downloaded by HTTrack, right?

    I am trying to download a website on a free host (in fact it's not a website, but, rather, a collection of pictures and some HTML documents that are not necessarily connected to each other). This web site is going to be closed in about two weeks, so I need to hurry up in order to download all my pics from that site. So I tried HTTrack attempting to download the whole site, but I got a message in the process that was saying this:

    WinHTTrack Website Copier

      • MIRROR ERROR! * * HTTrack has detected that the current mirror is empty. If it was an update, the previous mirror has been restored. Reason: the first page(s) either could not be found, or a connection problem occured. => Ensure that the website still exists, and/or check your proxy settings! <=

    I am using Windows XP.

  • brilliant
    brilliant over 14 years
    Thanks for answering my question. Now one more question: What about images? Does an image also have to be linked to from a page on a web site or, if it's not linked to from other pages and cannot be tracked back by inks to the index page, it will also stay invisible and, therefore, will be overlooked by HTTrack?
  • Snark
    Snark over 14 years
    It's the same for all files: images, CSS, ZIP files, ... If they're not linked, they cannot be found.