How to download entire HTML of a webpage using javascript?

15,629

Solution 1

It should be possible to do using jQuery ajax. Javascript in a Firefox extension is not subject to the cross-origin restriction. Here are some tips for using jQuery in a Firefox extension:

  1. Add the jQuery library to your extension's chrome/content/ directory.

  2. Load jQuery in the window load event callback rather than including it in your browser overlay XUL. Otherwise it can cause conflicts (e.g. clobbers a user's customized toolbar).

    (function(loader){ 
    loader.loadSubScript("chrome://ryebox/content/jquery-1.6.2.min.js"); })
    (Components.classes["@mozilla.org/moz/jssubscript-loader;1"].getService(Components.interfaces.mozIJSSubScriptLoader));
    
  3. Use "jQuery" instead of "$". I experienced weird behavior when using $ instead of jQuery (a conflict of some kind I suppose)

  4. Use jQuery(content.document) instead of jQuery(document) to access a page's DOM. In a Firefox extension "document" refers to the browser's XUL whereas "content.document" refers to the page's DOM.

I wrote a Firefox extension for getting bookmarks from my friend's bookmark site. It uses jQuery to fetch my bookmarks in a JSON response from his service, then creates a menu of those bookmarks so that I can easily access them. You can browse the source at https://github.com/erturne/ryebox

Solution 2

For JavaScript in general, the short answer is no, not unless all pages are within the same domain. JavaScript is limited by the same-origin policy, so for security reasons, you cannot do cross-domain requests like that.

However, as pointed out by Max and erturne in the comments, when JavaScript is written as part of an extension/add-on to the browser, the regular rules about same origin policy and cross-domain requests does not seem to apply - at least not for Firefox and Chrome. Therefor, using JavaScript to download the pages should be possible using a XMLHttpRequest, or using some of the wrapper methods included in your favorite JS-library.

If you like me prefer jQuery, you can have a look at jQuery's .load() method, that loads HTML from a given resource, and inject it into an element that you specify.

Edit: Made some updates to my answer based on the comments about cross-domain requests made by add-ons.

Solution 3

You can do XmlHttpRequests (XHR`s) if the combination scheme://domain:port is the same for the page hosting the JavaScript that should fetch the HTML.

Many JS-frameworks gives you easy XHR-support, Jquery, Dojo, etc. Example using DOJO:

function getText() {
  dojo.xhrGet({
    url: "test/someHtml.html",
        load: function(response, ioArgs){
      //The repsone is the HTML
      return response;
    },
    error: function(response, ioArgs){
      return response;
    },
    handleAs: "text"
  });
}

If you prefer writing your own XMLHttpRequest-handler, take a look here: http://www.w3schools.com/xml/xml_http.asp

Share:
15,629
B Faley
Author by

B Faley

Updated on June 04, 2022

Comments

  • B Faley
    B Faley almost 2 years

    Is it possible to download the entire HTML of a webpage using JavaScript given the URL? What I want to do is to develop a Firefox add-on to download the content of all the links found in the source of current page of browser.

    update: the URLs reside in the same domain

  • Nobita
    Nobita over 12 years
    Couldn't you use XMLHttpRequest to fetch those pages?
  • Christofer Eliasson
    Christofer Eliasson over 12 years
    @Nobita Not as long as the resource resides on a different domain. XMLHttpRequest is restricted by the same origin policy. However, it can be used as long as the requests is posted within the same domain.
  • bezmax
    bezmax over 12 years
    @ChristoferEliasson There is 'firefox-addon' tag in this question, are you sure firefox addon can't request extra rights to load from different domain? Chrome addons have such capability.
  • Christofer Eliasson
    Christofer Eliasson over 12 years
    @Max Interesting point, I expected the browser to run all JavaScript's under the same conditions, but I might be wrong there. Let's look in to it.
  • B Faley
    B Faley over 12 years
    Yes I want to download the URLs within the same domain. Is it possible to make use of JQuery in a Firefox add-on?
  • Christofer Eliasson
    Christofer Eliasson over 12 years
    @Meysam jQuery is just plain JavaScript, so I can't see why you shouldn't be able to use within your add-on.
  • erturne
    erturne over 12 years
    JavaScript in a Firefox extension is not subject to the cross origin restriction.
  • Christofer Eliasson
    Christofer Eliasson over 12 years
    @erturne Great to know. Do you have any good references to share?
  • erturne
    erturne over 12 years
    @ChristoferEliasson I've never found a good comprehensive source for all the quirky stuff you'll run into when doing Firefox extension development. I've just learned it all by trial and error (e.g. a user called me to complain that my extension messed up their customized toolbar!). You can browse the source for one of my extensions at leapingmind.repositoryhosting.com/trac/leapingmind_ryebox/…