Puppeteer wait until page is completely loaded
Solution 1
You can use page.waitForNavigation()
to wait for the new page to load completely before generating a PDF:
await page.goto(fullUrl, {
waitUntil: 'networkidle0',
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4',
});
If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector()
to ensure that the content is visible:
await page.waitForSelector('#example', {
visible: true,
});
Solution 2
Sometimes the networkidle
events do not always give an indication that the page has completely loaded. There could still be a few JS scripts
modifying the content on the page. So watching for the completion of HTML
source code modifications by the browser seems to be yielding better results. Here's a function you could use -
const waitTillHTMLRendered = async (page, timeout = 30000) => {
const checkDurationMsecs = 1000;
const maxChecks = timeout / checkDurationMsecs;
let lastHTMLSize = 0;
let checkCounts = 1;
let countStableSizeIterations = 0;
const minStableSizeIterations = 3;
while(checkCounts++ <= maxChecks){
let html = await page.content();
let currentHTMLSize = html.length;
let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);
console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);
if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize)
countStableSizeIterations++;
else
countStableSizeIterations = 0; //reset the counter
if(countStableSizeIterations >= minStableSizeIterations) {
console.log("Page rendered fully..");
break;
}
lastHTMLSize = currentHTMLSize;
await page.waitForTimeout(checkDurationMsecs);
}
};
You could use this after the page load
/ click
function call and before you process the page content. e.g.
await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()
Solution 3
In some cases, the best solution for me was:
await page.goto(url, { waitUntil: 'domcontentloaded' });
Some other options you could try are:
await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });
You can check this at puppeteer documentation: https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions
Solution 4
I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:
await page.waitForSelector('#blue-button');
Solution 5
In the latest Puppeteer version, networkidle2
worked for me:
await page.goto(url, { waitUntil: 'networkidle2' });
Related videos on Youtube
n.sharvarish
Updated on May 06, 2022Comments
-
n.sharvarish about 2 years
I am working on creating PDF from web page.
The application on which I am working is single page application.
I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412
But it is not working
const browser = await puppeteer.launch({ executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe', ignoreHTTPSErrors: true, headless: true, devtools: false, args: ['--no-sandbox', '--disable-setuid-sandbox'] }); const page = await browser.newPage(); await page.goto(fullUrl, { waitUntil: 'networkidle2' }); await page.type('#username', 'scott'); await page.type('#password', 'tiger'); await page.click('#Login_Button'); await page.waitFor(2000); await page.pdf({ path: outputFileName, displayHeaderFooter: true, headerTemplate: '', footerTemplate: '', printBackground: true, format: 'A4' });
What I want is to generate PDF report as soon as Page is loaded completely.
I don't want to write any type of delays i.e. await page.waitFor(2000);
I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.
Help will be appreciated.
-
Chilly Code almost 5 yearsWhere is the documentation for the signal 'networkidle0'?
-
sapeish over 4 years'networkidle0' is documented here github.com/GoogleChrome/puppeteer/blob/master/docs/…
-
Amanda over 4 yearsShould
page.waitForSelector
be called afterpage.goto
or before? Could you answer a similar question I asked stackoverflow.com/questions/58909236/… ? -
Jason about 4 yearsI'm not sure why this answer hasn't gotten more "love". In reality, a lot of the time we really just need to make sure JavaScript is done messing with the page before we scrape it. Network events don't accomplish this, and if you have dynamically generated content, there isn't always something you can reliably do a "waitForSelector/visible:true" on
-
Anand Mahajan almost 4 yearsThanks @roberto - btw I just updated the answer, you could use this with the 'load' event rather than 'networkidle2' . Thought it would be little more optimal with that. I have tested this in production and can confirm it works well too!
-
AbuZubair over 3 yearsThis doesn't ensure that any scripts loaded have finished executing. Therefore HTML could still be rendering and this would proceed.
-
Viacheslav Dobromyslov over 3 yearsjust use page.waitForTimeout(1000)
-
Or Assayag over 3 yearsWill check it out. Thanks.
-
kenberkeley over 3 years
waitFor
is deprecated and will be removed in a future release. See github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code. -
lxg over 3 yearsThe github issue states that they just deprecated the "magic" waitFor function. You can still use one of the specific waitFor*() functions. Hence your sleep() code is needless. (Not to mention that it’s overcomplicated for what it does, and it’s generally a bad idea to tackle concurrency problems with programmatic timeouts.)
-
Gary over 3 yearsWhy would I use networkidle0 when I could use the default load event? Is it faster to use networkidle0?
-
Arch4Arts over 3 yearsYou are a genius, this is such an obvious solution, especially when you are waiting for specific elements, and as soon as I did not guess myself, thank you!
-
Nicolás A. about 3 years@Arch4Arts you should create your own clicking function that does the waiting for you as well as clicking
-
Michael Paccione about 3 yearsGreat solution and should be part of puppeteer library, however please not waitFor is deprecated an will be removed in a future release: github.com/puppeteer/puppeteer/issues/6214
-
Ambroise Rabier about 3 yearsI tried to put the
checkDurationMsecs
to 200ms, and the bodyHTMLSize keep changing, and give huge numbers, I am using electron and rect also, very strange. -
Ambroise Rabier about 3 yearsOk I found that ridiculous hard to catch bug. If your luck manage to catch that 100k long html page, you realize there are CSS classes like
CodeMirror
, must be codemirror.net , meaning....document.body.innerHTML
is catching the dev console too ! Just removemainWindow.webContents.openDevTools();
for e2e testing. I hope don't get any more bad surprise. -
Mark Cupitt about 3 yearsSolved a headache for me on a high latency connection .. Well Done
-
milos over 2 years
page.waitForNavigation({ waitUntil: 'networkidle0' })
is this same aspage .waitForNetworkIdle()
? -
chovy over 2 yearslink to docs is broken now
-
Eduardo Conte over 2 yearslink updated, thanks @chovy
-
ggorlen over 2 yearsIf you're clicking something that triggers navigation, there's a race condition if
Promise.all isn't used
, e.g.Promise.all([page.click(...), page.waitForNavigation(...)])
-
ggorlen over 2 yearsThis should go in a pyppeteer question, not a puppeteer question.
-
ggorlen over 2 years@Gary See this comment by a (former) core Puppeteer developer.
-
ggorlen about 2 yearsFor those who are confused by these options,
domcontentloaded
is the first one to fire, so you generally use it when you want to move on with your script before any external resources load. Typically, this is because you don't want data from them.load
,networkidle2
andnetworkidle0
offer different flavors of waiting for resources in roughly increasing strictness, but none of them provide an exact guarantee that "the page is loaded" (because this varies from site to site, so it's ill-defined in general). -
Bergi about 2 yearsI had same Problems. I use format: "A4". My Solution was not to use the scale (<1) option.