get WPF WebBrowser HTML

11,495

Solution 1

I made something like this once. It was horrible, but it works.

You need to add a reference to Microsoft.mshtml.

Then you can use IHTMLDocument2. Why 2? Good question... anyway, I wrote a couple of helper functions like this:

public static void FillField(object doc, string id, string value)
{
    var element = findElementByID(doc, id);
    element.setAttribute("value", value);
}

public static void ClickButton(object doc, string id)
{
    var element = findElementByID(doc, id);
    element.click();
}

private static IHTMLElement findElementByID(object doc, string id)
{
    IHTMLDocument2 thisDoc;
    if (!(doc is IHTMLDocument2))
        return null;
    else
        thisDoc = (IHTMLDocument2)doc;

    var element = thisDoc.all.OfType<IHTMLElement>()
        .Where(n => n != null && n.id != null)
        .Where(e => e.id == id).First();
    return element;
}

Executing JS

private static void ExecuteScript(object doc, string js)
{
    IHTMLDocument2 thisDoc;
    if (!(doc is IHTMLDocument2))
        return;
    else
        thisDoc = (IHTMLDocument2)doc;
    thisDoc.parentWindow.execScript(js);
}

I call them like this...

HtmlDocumentHelper.FillField(webBrowser.Document, <id>, <value>);
HtmlDocumentHelper.FillField(webBrowser.Document, <id>, <value>);
HtmlDocumentHelper.ClickButton(webBrowser.Document, <id>);
HtmlDocumentHelper.ExecuteScript(webBrowser.Document, "alert(1);");

Solution 2

Yeeeaaaah! I did it. It's so simple:

    string HTML = (browser.Document as mshtml.IHTMLDocument2).body.outerHTML;

Solution 3

When I tried @Gray or @czubehead's code body was always null. The following code, however, worked for me:

dynamic webBrowserDocument = webBrowser.Document;
string html = webBrowserDocument?.documentElement?.InnerHtml;

And make sure that this should go into LoadCompleted or later. When using this in Navigated the source is not complete or even null.

Share:
11,495
czubehead
Author by

czubehead

My name is Petr Čech, I'm 17 years old and mainly interested in programming desktop apps for Windows platform in C#.NET, webdesign and making models of ships as well. In games or forums I use the nick "Czubehead". I'm currently attending the grammar school Biskupské gymnázium J.N. Neumana v Českých Budějovicích, because of great students and teachers , the opportunity to learn more things than others.

Updated on June 13, 2022

Comments

  • czubehead
    czubehead almost 2 years

    I'm using Wpf WebBrowser to access a certain page. I need to get it's HTML content- I can't use Webclient or WebReques etc. because I need to execute JS on that pages. I also tried Awesomium and Wf WebBrowser (both wrong).

        dynamic doc=browser.Document;
        var text=doc.InnerHtml//or something like this
    

    Code above doesn't work for me, it shows nullreference. Can anybody tell me how to fetch it? I've been searching for this for weeks and didn't find anything really working :/ . Please answer like for a biggest dumbass you can imagine :D. It sometimes happens to me that people send me a piece of code and I have no idea how to use it... I mean please make your posts like ending with

         string HTML=some_stuff;
    

    Or if you know about some alternative browser which is not buggy and where I can access HTML or something that would let me execute JS on loaded Html with affects like cookies and changes in HTML source that's also a really good answer. I'll be appreciative for any help.