How to pass cookies to HtmlAgilityPack or WebClient?

16,961

Solution 1

Check HtmlAgilityPack.HtmlDocument Cookies

Here is an example of what you're looking for (syntax not 100% tested, I just modified some class I usually use):

public class MyWebClient
{
    //The cookies will be here.
    private CookieContainer _cookies = new CookieContainer();

    //In case you need to clear the cookies
    public void ClearCookies() {
        _cookies = new CookieContainer();
    }

    public HtmlDocument GetPage(string url) {
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        request.Method = "GET";

        //Set more parameters here...
        //...

        //This is the important part.
        request.CookieContainer = _cookies;

        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        var stream = response.GetResponseStream();

        //When you get the response from the website, the cookies will be stored
        //automatically in "_cookies".

        using (var reader = new StreamReader(stream)) {
            string html = reader.ReadToEnd();
            var doc = new HtmlDocument();
            doc.LoadHtml(html);
            return doc;
        }
    }
}

Here is how you use it:

var client = new MyWebClient();
HtmlDocument doc = client.GetPage("http://somepage.com");

//This request will be sent with the cookies obtained from the page
doc = client.GetPage("http://somepage.com/another-page");

Note: If you also want to use POST method, just create a method similar to GetPage with the POST logic, refactor the class, etc.

Solution 2

There are some recommendations here: Using CookieContainer with WebClient class

However, it's probably just easier to keep using the HttpWebRequest and set the cookie in the CookieContainer:

The code looks something like this:

 // Create a HttpWebRequest
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(getUrl);

// Create the cookie container and add a cookie
request.CookieContainer = new CookieContainer();

// Add all the cookies
foreach (Cookie cookie in response.Cookies)
{
    request.CookieContainer.Add(cookie);
}

The second thing is that you don't need to download the site again, since you already have it from your web response and you're saving it here:

HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
{
        webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
}

You should be able to just take the HTML and parse it with the HTML Agility Pack:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(webBrowser1.DocumentText);

And that should do it... :)

Solution 3

Try caching cookies from previous response locally and resend them each web request as follows:

private CookieCollection cookieCollection;

...

    parserObject = new HtmlWeb
                {
                    AutoDetectEncoding = true,
                    PreRequest = request =>
                    {
                        if (cookieCollection != null)
                            cookieCollection.Cast<Cookie>()
                                .ForEach(cookie => request.CookieContainer.Add(cookie));
                        return true;
                    },
                    PostResponse = (request, response) => { cookieCollection = response.Cookies; }
                };
Share:
16,961
a1204773
Author by

a1204773

Updated on June 05, 2022

Comments

  • a1204773
    a1204773 almost 2 years

    I use this code to login:

    CookieCollection cookies = new CookieCollection();
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("example.com");
    request.CookieContainer = new CookieContainer();
    request.CookieContainer.Add(cookies);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    cookies = response.Cookies;
    
    string getUrl = "example.com";
    string postData = String.Format("my parameters");
    HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
    getRequest.CookieContainer = new CookieContainer();
    getRequest.CookieContainer.Add(cookies);
    getRequest.Method = WebRequestMethods.Http.Post;
    getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
    getRequest.AllowWriteStreamBuffering = true;
    getRequest.ProtocolVersion = HttpVersion.Version11;
    getRequest.AllowAutoRedirect = true;
    getRequest.ContentType = "application/x-www-form-urlencoded";
    
    byte[] byteArray = Encoding.ASCII.GetBytes(postData);
    getRequest.ContentLength = byteArray.Length;
    Stream newStream = getRequest.GetRequestStream();
    newStream.Write(byteArray, 0, byteArray.Length);
    newStream.Close();
    
    HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
    using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
    {
            doc.LoadHtml(sr.ReadToEnd());
            webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
    }
    

    then I want to use HtmlWeb (HtmlAgilityPack) or Webclient to parse the HTML to HtmlDocument(HtmlAgilityPack).

    My problem is that when I use:

    WebClient wc = new WebClient();
    webBrowser1.DocumentText = wc.DownloadString(site);
    

    or

    doc = web.Load(site);
    webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
    

    The login disappear so i think I must somehow pass the cookies.. Any suggestions?

  • a1204773
    a1204773 about 11 years
    I login to the site but then i want to navigate somewhere else on this site. Actually i do a search on the site.
  • Kiril
    Kiril about 11 years
    You have to keep providing the cookies in every request you make. If you don't supply the cookies with every request, then it will assume you logged out (most login info is contained in the cookie).
  • a1204773
    a1204773 about 11 years
    to do login i use login(); function, could you please help me make getHTML(url); function cause your above code is not complete.
  • Kiril
    Kiril about 11 years
    @Loclip OK, login(); is not a C# function and neither is getHTML, so I can't help you much there. The code that I'm showing you is supposed to help you figure out how to include a cookie with your HttpWebRequest (which should be all you need to make all of your requests for HTML content from a page). So given the code you've shown in your question and the one that I'm showing in my answer, where exactly is the problem? You have to provide me with some meaningful information, simply saying "i tried to complete your code but it still ask me to login" doesn't tell me much.