How to download file with puppeteer using headless: true?
Solution 1
This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so
let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");
This on chrome opens a new tab.
You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.
const browser = await puppeteer.launch({
headless: true
});
browser.on('targetcreated', async (target) => {
let s = target.url();
//the test opens an about:blank to start - ignore this
if (s == 'about:blank') {
return;
}
//unencode the characters after removing the content type
s = s.replace("data:text/csv;charset=utf-8,", "");
//clean up string by unencoding the %xx
...
fs.writeFile("/tmp/download.csv", s, function(err) {
if(err) {
console.log(err);
return;
}
console.log("The file was saved!");
});
});
const page = await browser.newPage();
.. open link ...
.. click on download link ..
Solution 2
I spent hours poring through this thread and Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated
, and the next answer, for whatever reason, did not retain the authenticated session. This article saved the day. In short, fetch
. Hopefully this helps someone else out.
const res = await this.page.evaluate(() =>
{
return fetch('https://example.com/path/to/file.csv', {
method: 'GET',
credentials: 'include'
}).then(r => r.text());
});
Solution 3
The problem is that the browser closes before download finished.
You can get the filesize and the name of the file from the response, and then use a watch script to check file size from downloaded file, in order to close the browser.
This is an example:
const filename = "set this with some regex in response";
const dir = "watch folder or file";
// Download and wait for download
await Promise.all([
page.click('#DownloadFile'),
// Event on all responses
page.on('response', response => {
// If response has a file on it
if (response._headers['content-disposition'] === `attachment;filename=${filename}`) {
// Get the size
console.log('Size del header: ', response._headers['content-length']);
// Watch event on download folder or file
fs.watchFile(dir, function (curr, prev) {
// If current size eq to size from response then close
if (parseInt(curr.size) === parseInt(response._headers['content-length'])) {
browser.close();
this.close();
}
});
}
})
]);
Even that the way of searching in response can be improved though I hope you'll find this useful.
Solution 4
I found a way to wait for browser capability to download a file. The idea is to wait for response with predicate. In my case URL ends with '/data'.
I just didn't like to load file contents into buffer.
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: download_path,
});
await frame.focus(report_download_selector);
await Promise.all([
page.waitForResponse(r => r.url().endsWith('/data')),
page.keyboard.press('Enter'),
]);
Solution 5
I needed to download a file from behind a login, which was being handled by Puppeteer. targetcreated
was not being triggered. In the end I downloaded with request
, after copying the cookies over from the Puppeteer instance.
In this case, I'm streaming the file through, but you could just as easily save it.
res.writeHead(200, {
"Content-Type": 'application/octet-stream',
"Content-Disposition": `attachment; filename=secretfile.jpg`
});
let cookies = await page.cookies();
let jar = request.jar();
for (let cookie of cookies) {
jar.setCookie(`${cookie.name}=${cookie.value}`, "http://secretsite.com");
}
try {
var response = await request({ url: "http://secretsite.com/secretfile.jpg", jar }).pipe(res);
} catch(err) {
console.trace(err);
return res.send({ status: "error", message: err });
}
Antonio Gomez Alvarado
Updated on July 09, 2022Comments
-
Antonio Gomez Alvarado almost 2 years
I've been running the following code in order to download a
csv
file from the websitehttp://niftyindices.com/resources/holiday-calendar
:const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({headless: true}); const page = await browser.newPage(); await page.goto('http://niftyindices.com/resources/holiday-calendar'); await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: '/tmp'}) await page.click('#exportholidaycalender'); await page.waitFor(5000); await browser.close(); })();
with
headless: false
it works, it downloads the file into/Users/user/Downloads
. withheadless: true
it does NOT work.I'm running this on a macOS Sierra (MacBook Pro) using puppeteer version
1.1.1
which pulls Chromium version66.0.3347.0
into.local-chromium/
directory and usednpm init
andnpm i --save puppeteer
to set it up.Any idea whats wrong?
Thanks in advance for your time and help,
-
Antonio Gomez Alvarado over 6 yearsPerfect! works! This also doesnt require
page._client
to be present. -
nurettin over 5 yearsthis may work for some downloads, but doesn't work in my case where the server requires a post request and is careful about not returning contents as a response body, but instead as a file download with type octet stream.
-
Jay Shark over 3 yearsThis worked for me - thanks! Whatever it is about my bank, I couldn't get any of the other approaches to work. No matter how I attempted to intercept the request or make a separate request with the same headers etc, the backend seemed to somehow identify that it hadn't come from their frontend and returned an error page. This works though.
-
Vikash Rathee over 3 yearsThis doesn't wait for the download to get fully completed. How to wait ?
-
caram over 3 yearsPlease share your code otherwise this does not really help.
-
Jeff Kilbride almost 3 yearsI was having a problem downloading a large text file (70MB) even with headless
false
. The page would never fully load. Usingfetch
worked like a charm. Thanks! -
Josias da Paixao junior over 2 yearsIt worked perfectly, thank you!