How to pass cookies to HtmlAgilityPack or WebClient?
Solution 1
Check HtmlAgilityPack.HtmlDocument Cookies
Here is an example of what you're looking for (syntax not 100% tested, I just modified some class I usually use):
public class MyWebClient
{
//The cookies will be here.
private CookieContainer _cookies = new CookieContainer();
//In case you need to clear the cookies
public void ClearCookies() {
_cookies = new CookieContainer();
}
public HtmlDocument GetPage(string url) {
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
//Set more parameters here...
//...
//This is the important part.
request.CookieContainer = _cookies;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
//When you get the response from the website, the cookies will be stored
//automatically in "_cookies".
using (var reader = new StreamReader(stream)) {
string html = reader.ReadToEnd();
var doc = new HtmlDocument();
doc.LoadHtml(html);
return doc;
}
}
}
Here is how you use it:
var client = new MyWebClient();
HtmlDocument doc = client.GetPage("http://somepage.com");
//This request will be sent with the cookies obtained from the page
doc = client.GetPage("http://somepage.com/another-page");
Note: If you also want to use POST
method, just create a method similar to GetPage
with the POST
logic, refactor the class, etc.
Solution 2
There are some recommendations here: Using CookieContainer with WebClient class
However, it's probably just easier to keep using the HttpWebRequest
and set the cookie in the CookieContainer
:
- HTTPWebRequest and CookieContainer
- http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx
The code looks something like this:
// Create a HttpWebRequest
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(getUrl);
// Create the cookie container and add a cookie
request.CookieContainer = new CookieContainer();
// Add all the cookies
foreach (Cookie cookie in response.Cookies)
{
request.CookieContainer.Add(cookie);
}
The second thing is that you don't need to download the site again, since you already have it from your web response and you're saving it here:
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
{
webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
}
You should be able to just take the HTML and parse it with the HTML Agility Pack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(webBrowser1.DocumentText);
And that should do it... :)
Solution 3
Try caching cookies from previous response locally and resend them each web request as follows:
private CookieCollection cookieCollection;
...
parserObject = new HtmlWeb
{
AutoDetectEncoding = true,
PreRequest = request =>
{
if (cookieCollection != null)
cookieCollection.Cast<Cookie>()
.ForEach(cookie => request.CookieContainer.Add(cookie));
return true;
},
PostResponse = (request, response) => { cookieCollection = response.Cookies; }
};
a1204773
Updated on June 05, 2022Comments
-
a1204773 almost 2 years
I use this code to login:
CookieCollection cookies = new CookieCollection(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create("example.com"); request.CookieContainer = new CookieContainer(); request.CookieContainer.Add(cookies); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); cookies = response.Cookies; string getUrl = "example.com"; string postData = String.Format("my parameters"); HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl); getRequest.CookieContainer = new CookieContainer(); getRequest.CookieContainer.Add(cookies); getRequest.Method = WebRequestMethods.Http.Post; getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0"; getRequest.AllowWriteStreamBuffering = true; getRequest.ProtocolVersion = HttpVersion.Version11; getRequest.AllowAutoRedirect = true; getRequest.ContentType = "application/x-www-form-urlencoded"; byte[] byteArray = Encoding.ASCII.GetBytes(postData); getRequest.ContentLength = byteArray.Length; Stream newStream = getRequest.GetRequestStream(); newStream.Write(byteArray, 0, byteArray.Length); newStream.Close(); HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse(); using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251"))) { doc.LoadHtml(sr.ReadToEnd()); webBrowser1.DocumentText = doc.DocumentNode.OuterHtml; }
then I want to use HtmlWeb (HtmlAgilityPack) or Webclient to parse the HTML to HtmlDocument(HtmlAgilityPack).
My problem is that when I use:
WebClient wc = new WebClient(); webBrowser1.DocumentText = wc.DownloadString(site);
or
doc = web.Load(site); webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
The login disappear so i think I must somehow pass the cookies.. Any suggestions?
-
a1204773 about 11 yearsI login to the site but then i want to navigate somewhere else on this site. Actually i do a search on the site.
-
Kiril about 11 yearsYou have to keep providing the cookies in every request you make. If you don't supply the cookies with every request, then it will assume you logged out (most login info is contained in the cookie).
-
a1204773 about 11 yearsto do login i use
login();
function, could you please help me makegetHTML(url);
function cause your above code is not complete. -
Kiril about 11 years@Loclip OK,
login();
is not a C# function and neither isgetHTML
, so I can't help you much there. The code that I'm showing you is supposed to help you figure out how to include a cookie with yourHttpWebRequest
(which should be all you need to make all of your requests for HTML content from a page). So given the code you've shown in your question and the one that I'm showing in my answer, where exactly is the problem? You have to provide me with some meaningful information, simply saying "i tried to complete your code but it still ask me to login" doesn't tell me much.