Puppeteer wait until page is completely loaded

168,009

Solution 1

You can use page.waitForNavigation() to wait for the new page to load completely before generating a PDF:

await page.goto(fullUrl, {
  waitUntil: 'networkidle0',
});

await page.type('#username', 'scott');
await page.type('#password', 'tiger');

await page.click('#Login_Button');

await page.waitForNavigation({
  waitUntil: 'networkidle0',
});

await page.pdf({
  path: outputFileName,
  displayHeaderFooter: true,
  headerTemplate: '',
  footerTemplate: '',
  printBackground: true,
  format: 'A4',
});

If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector() to ensure that the content is visible:

await page.waitForSelector('#example', {
  visible: true,
});

Solution 2

Sometimes the networkidle events do not always give an indication that the page has completely loaded. There could still be a few JS scripts modifying the content on the page. So watching for the completion of HTML source code modifications by the browser seems to be yielding better results. Here's a function you could use -

const waitTillHTMLRendered = async (page, timeout = 30000) => {
  const checkDurationMsecs = 1000;
  const maxChecks = timeout / checkDurationMsecs;
  let lastHTMLSize = 0;
  let checkCounts = 1;
  let countStableSizeIterations = 0;
  const minStableSizeIterations = 3;

  while(checkCounts++ <= maxChecks){
    let html = await page.content();
    let currentHTMLSize = html.length; 

    let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);

    console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);

    if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize) 
      countStableSizeIterations++;
    else 
      countStableSizeIterations = 0; //reset the counter

    if(countStableSizeIterations >= minStableSizeIterations) {
      console.log("Page rendered fully..");
      break;
    }

    lastHTMLSize = currentHTMLSize;
    await page.waitForTimeout(checkDurationMsecs);
  }  
};

You could use this after the page load / click function call and before you process the page content. e.g.

await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()

Solution 3

In some cases, the best solution for me was:

await page.goto(url, { waitUntil: 'domcontentloaded' });

Some other options you could try are:

await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });

You can check this at puppeteer documentation: https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions

Solution 4

I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:

await page.waitForSelector('#blue-button');

Solution 5

In the latest Puppeteer version, networkidle2 worked for me:

await page.goto(url, { waitUntil: 'networkidle2' });
Share:
168,009

Related videos on Youtube

n.sharvarish
Author by

n.sharvarish

Updated on May 06, 2022

Comments

  • n.sharvarish
    n.sharvarish about 2 years

    I am working on creating PDF from web page.

    The application on which I am working is single page application.

    I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412

    But it is not working

        const browser = await puppeteer.launch({
        executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
        ignoreHTTPSErrors: true,
        headless: true,
        devtools: false,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
    
    const page = await browser.newPage();
    
    await page.goto(fullUrl, {
        waitUntil: 'networkidle2'
    });
    
    await page.type('#username', 'scott');
    await page.type('#password', 'tiger');
    
    await page.click('#Login_Button');
    await page.waitFor(2000);
    
    await page.pdf({
        path: outputFileName,
        displayHeaderFooter: true,
        headerTemplate: '',
        footerTemplate: '',
        printBackground: true,
        format: 'A4'
    });
    

    What I want is to generate PDF report as soon as Page is loaded completely.

    I don't want to write any type of delays i.e. await page.waitFor(2000);

    I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.

    Help will be appreciated.

  • Chilly Code
    Chilly Code almost 5 years
    Where is the documentation for the signal 'networkidle0'?
  • sapeish
    sapeish over 4 years
    'networkidle0' is documented here github.com/GoogleChrome/puppeteer/blob/master/docs/…
  • Amanda
    Amanda over 4 years
    Should page.waitForSelector be called after page.goto or before? Could you answer a similar question I asked stackoverflow.com/questions/58909236/… ?
  • Jason
    Jason about 4 years
    I'm not sure why this answer hasn't gotten more "love". In reality, a lot of the time we really just need to make sure JavaScript is done messing with the page before we scrape it. Network events don't accomplish this, and if you have dynamically generated content, there isn't always something you can reliably do a "waitForSelector/visible:true" on
  • Anand Mahajan
    Anand Mahajan almost 4 years
    Thanks @roberto - btw I just updated the answer, you could use this with the 'load' event rather than 'networkidle2' . Thought it would be little more optimal with that. I have tested this in production and can confirm it works well too!
  • AbuZubair
    AbuZubair over 3 years
    This doesn't ensure that any scripts loaded have finished executing. Therefore HTML could still be rendering and this would proceed.
  • Viacheslav Dobromyslov
    Viacheslav Dobromyslov over 3 years
    just use page.waitForTimeout(1000)
  • Or Assayag
    Or Assayag over 3 years
    Will check it out. Thanks.
  • kenberkeley
    kenberkeley over 3 years
    waitFor is deprecated and will be removed in a future release. See github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
  • lxg
    lxg over 3 years
    The github issue states that they just deprecated the "magic" waitFor function. You can still use one of the specific waitFor*() functions. Hence your sleep() code is needless. (Not to mention that it’s overcomplicated for what it does, and it’s generally a bad idea to tackle concurrency problems with programmatic timeouts.)
  • Gary
    Gary over 3 years
    Why would I use networkidle0 when I could use the default load event? Is it faster to use networkidle0?
  • Arch4Arts
    Arch4Arts over 3 years
    You are a genius, this is such an obvious solution, especially when you are waiting for specific elements, and as soon as I did not guess myself, thank you!
  • Nicolás A.
    Nicolás A. about 3 years
    @Arch4Arts you should create your own clicking function that does the waiting for you as well as clicking
  • Michael Paccione
    Michael Paccione about 3 years
    Great solution and should be part of puppeteer library, however please not waitFor is deprecated an will be removed in a future release: github.com/puppeteer/puppeteer/issues/6214
  • Ambroise Rabier
    Ambroise Rabier about 3 years
    I tried to put the checkDurationMsecs to 200ms, and the bodyHTMLSize keep changing, and give huge numbers, I am using electron and rect also, very strange.
  • Ambroise Rabier
    Ambroise Rabier about 3 years
    Ok I found that ridiculous hard to catch bug. If your luck manage to catch that 100k long html page, you realize there are CSS classes like CodeMirror, must be codemirror.net , meaning.... document.body.innerHTML is catching the dev console too ! Just remove mainWindow.webContents.openDevTools(); for e2e testing. I hope don't get any more bad surprise.
  • Mark Cupitt
    Mark Cupitt about 3 years
    Solved a headache for me on a high latency connection .. Well Done
  • milos
    milos over 2 years
    page.waitForNavigation({ waitUntil: 'networkidle0' }) is this same as page .waitForNetworkIdle()?
  • chovy
    chovy over 2 years
    link to docs is broken now
  • Eduardo Conte
    Eduardo Conte over 2 years
    link updated, thanks @chovy
  • ggorlen
    ggorlen over 2 years
    If you're clicking something that triggers navigation, there's a race condition if Promise.all isn't used, e.g. Promise.all([page.click(...), page.waitForNavigation(...)])
  • ggorlen
    ggorlen over 2 years
    This should go in a pyppeteer question, not a puppeteer question.
  • ggorlen
    ggorlen over 2 years
    @Gary See this comment by a (former) core Puppeteer developer.
  • ggorlen
    ggorlen about 2 years
    For those who are confused by these options, domcontentloaded is the first one to fire, so you generally use it when you want to move on with your script before any external resources load. Typically, this is because you don't want data from them. load, networkidle2 and networkidle0 offer different flavors of waiting for resources in roughly increasing strictness, but none of them provide an exact guarantee that "the page is loaded" (because this varies from site to site, so it's ill-defined in general).
  • Bergi
    Bergi about 2 years
    I had same Problems. I use format: "A4". My Solution was not to use the scale (<1) option.