How can I capture all network requests and full response data when loading a page in Chrome?
Solution 1
You can enable a request interception with page.setRequestInterception()
for each request, and then, inside page.on('request')
, you can use the request-promise-native
module to act as a middle man to gather the response data before continuing the request with request.continue()
in Puppeteer.
Here's a full working example:
'use strict';
const puppeteer = require('puppeteer');
const request_client = require('request-promise-native');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const result = [];
await page.setRequestInterception(true);
page.on('request', request => {
request_client({
uri: request.url(),
resolveWithFullResponse: true,
}).then(response => {
const request_url = request.url();
const request_headers = request.headers();
const request_post_data = request.postData();
const response_headers = response.headers;
const response_size = response_headers['content-length'];
const response_body = response.body;
result.push({
request_url,
request_headers,
request_post_data,
response_headers,
response_size,
response_body,
});
console.log(result);
request.continue();
}).catch(error => {
console.error(error);
request.abort();
});
});
await page.goto('https://example.com/', {
waitUntil: 'networkidle0',
});
await browser.close();
})();
Solution 2
Puppeteer-only solution
This can be done with puppeteer alone. The problem you are describing that the response.buffer
is cleared on navigation, can be circumvented by processing each request one after another.
How it works
The code below uses page.setRequestInterception
to intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer()
can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.
Code
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
const results = []; // collects all results
let paused = false;
let pausedRequests = [];
const nextRequest = () => { // continue the next request or "unpause"
if (pausedRequests.length === 0) {
paused = false;
} else {
// continue first request in "queue"
(pausedRequests.shift())(); // calls the request.continue function
}
};
await page.setRequestInterception(true);
page.on('request', request => {
if (paused) {
pausedRequests.push(() => request.continue());
} else {
paused = true; // pause, as we are processing a request now
request.continue();
}
});
page.on('requestfinished', async (request) => {
const response = await request.response();
const responseHeaders = response.headers();
let responseBody;
if (request.redirectChain().length === 0) {
// body can only be access for non-redirect responses
responseBody = await response.buffer();
}
const information = {
url: request.url(),
requestHeaders: request.headers(),
requestPostData: request.postData(),
responseHeaders: responseHeaders,
responseSize: responseHeaders['content-length'],
responseBody,
};
results.push(information);
nextRequest(); // continue with next request
});
page.on('requestfailed', (request) => {
// handle failed request
nextRequest();
});
await page.goto('...', { waitUntil: 'networkidle0' });
console.log(results);
await browser.close();
})();
Solution 3
I would suggest you to search for a quick proxy server which allows to write requests logs together with actual content.
The target setup is to allow proxy server to just write a log file, and then analyze the log, searching for information you need.
Don't intercept requests while proxy is working (this will lead to slow down)
The performance issues(with proxy as logger setup) you may encounter are mostly related to TLS support, please pay attention to allow quick TLS handshake, HTTP2 protocol in the proxy setup
E.g. Squid benchmarks show that it is able to process hundreds RPS, which should be enough for testing purposes
Solution 4
I would suggest using a tool namely 'fiddler'. It will capture all the information that you mentioned when you load a URL url.
Matt Zeunert
Web developer in London working on JavaScript apps and tools for developers. Currently working on a front-end monitoring tool: DebugBear
Updated on February 25, 2021Comments
-
Matt Zeunert about 3 years
Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:
- request URL
- request headers
- request post data
- response headers text (including duplicate headers like
set-cookie
) - transferred response size (i.e. compressed size)
- full response body
Capturing the full response body is what causes the problems for me.
Things I've tried:
- Getting response content with
response.buffer
- this does not work if there are redirects at any point, since buffers are wiped on navigation - intercepting requests and using
getResponseBodyForInterception
- this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases - Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)
Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.
-
Md. Abu Taher over 5 yearsWas expecting you to write an answer, otherwise I would write the same answer. :D
-
Matt Zeunert over 5 yearsThanks! This approach breaks some sites because at request interception some headers aren't included yet (e.g. Accept and Cookie). github.com/GoogleChrome/puppeteer/issues/3436 I want the outgoing request to have the same headers as without request interception.
-
Matt Zeunert over 5 yearsI think
request.continue
will make a new request rather than use the same data, butrequest.respond
should work. -
Matt Zeunert over 5 yearsThat's using
response.buffer
which gets wiped on navigation. -
Jose Rodriguez over 5 yearsthere's an checkbox to preserve log, so you can reload the page and you will not lose requests log
-
Matt Zeunert over 5 yearsIt doesn't work, it only shows a "Failed to load response data" message after navigation.
-
Matt Zeunert over 5 yearsThanks! I wasn't too keen on using a proxy because of the performance problems I was having, but I'll look into it again.
-
Andrii Muzalevskyi over 5 years@MattZeunert, thank you, please let me know if you need any help with it
-
Nisim Joseph over 4 yearsI tried to manipulate the request URL but it doesn't allow it and I couldn't see the different URL in the Tracing of chrome. any ideas on how to do it?
-
FelipeKunzler about 4 years
request-promise-native
seems to be deprecated as of now. -
onassar over 3 yearsWhy do you need to pause requests? Why can't you simply let requests continue, and use the
requestfinished
event to check for the URL and response headers and store those? In my case, all I want are the headers associated with a particular request URL. -
Thomas Dondorf over 3 years@onassar Your use case is different to OPs. The question was how to capture "full response data" not just headers.
-
onassar over 3 yearsAhh okay. So if all I care about is the response headers, I could simplify the approach yah? In my case, I call
setRequestInterception
withtrue
, and then callcontinue
on request objects in the following events:request
,requestfailed
andrequestfinished
. The exception is I store the headers inrequestfinished
event calls. That make sense? -
Thomas Dondorf over 3 years@onassar Yes, if you don't need the buffer you can simplify it.
-
Willis over 3 yearswhat does
[page]
use for? I didn't see it use in your code. -
Thomas Dondorf over 3 years@Willis It's a destructuring assignment: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
-
Willis over 3 years@ThomasDondorf got it, sorry I'm not very familiar with js, thank you!
-
NeuronButter almost 3 yearsI think you've confused Puppeteer with Pyppeteer. Puppeteer is for JavaScript, and the Pyppeteer library is just a port from the JavaScript one.
-
Gergely M almost 3 yearsHi @NeuronButter, I'm not confused just trying to help those who need help with Pyppeteer - which in fact not the same as Puppeteer. I did that because it's hard to find info for Pyppeteer. Search engines - like Google's - keep returning Puppeteer-related hits. That's how I ended up on this page. Regardless, I take your -1 gracefully since mine isn't a solution for the OP indeed.
-
NeuronButter almost 3 yearsThat actually makes a lot of sense. I can't remove my -1 vote (sorry!), but in the future, try using "pyppeteer" (with the quotes) on Google, so you get an exact search match, and hopefully more relevant results :)