Get HTML source code from CefSharp web browser

46,991

Solution 1

I don't think I quite get this DispatcherTimer solution. I would do it like this:

public frmSelection()
{
    InitializeComponent();

    wb.FrameLoadEnd += WebBrowserFrameLoadEnded;
    wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
}

private void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
    if (e.Frame.IsMain)
    {
        wb.ViewSource();
        wb.GetSourceAsync().ContinueWith(taskHtml =>
        {
            var html = taskHtml.Result;
        });
    }
}

I did a diff on the output of ViewSource and the text in the html variable and they are the same, so I can't reproduce your problem here.

This said, I noticed that the main frame gets loaded pretty late, so you have to wait quite a while until the notepad pops up with the source.

Solution 2

I was having the same issue trying to get click on and item located in a frame and not on the main frame. Using the example in your answer, I wrote the following extension method:

        public static IFrame GetFrame(this ChromiumWebBrowser browser, string FrameName)
    {
        IFrame frame = null;

        var identifiers = browser.GetBrowser().GetFrameIdentifiers();

        foreach (var i in identifiers)
        {
            frame = browser.GetBrowser().GetFrame(i);
            if (frame.Name == FrameName)
                return frame;
        }

        return null;
    }

If you have a "using" on your form for the module that contains this method you can do something like:

var frame = browser.GetFrame("nameofframe");
        if (frame != null)
        {
            string HTML = await frame.GetSourceAsync();
        }

Of course you need to make sure the page load is complete before using this, but I plan to use it a lot. Hope it helps!

Jim

Share:
46,991

Related videos on Youtube

Scott
Author by

Scott

Updated on July 09, 2022

Comments

  • Scott
    Scott almost 2 years

    I am using aCefSharp.Wpf.ChromiumWebBrowser (Version 47.0.3.0) to load a web page. Some point after the page has loaded I want to get the source code.

    I have called:

    wb.GetBrowser().MainFrame.GetSourceAsync()
    

    however it does not appear to be returning all the source code (I believe this is because there are child frames).

    If I call:

    wb.GetBrowser().MainFrame.ViewSource() 
    

    I can see it lists all the source code (including the inner frames).

    I would like to get the same result as ViewSource(). Could some one point me in the right direction please?

    Update – Added Code example

    Note: The address the web browser is pointing too will only work up to and including 10/03/2016. After that it may display different data which is not what I would be looking at.

    In the frmSelection.xaml file

    <cefSharp:ChromiumWebBrowser Name="wb" Grid.Column="1" Grid.Row="0" />
    

    In the frmSelection.xaml.cs file

    public partial class frmSelection : UserControl
    {
        private System.Windows.Threading.DispatcherTimer wbTimer = new System.Windows.Threading.DispatcherTimer();
    
        public frmSelection()
        {
    
             InitializeComponent();
    
             // This timer will start when a web page has been loaded.
             // It will wait 4 seconds and then call wbTimer_Tick which 
             // will then see if data can be extracted from the web page.
             wbTimer.Interval = new TimeSpan(0, 0, 4);
             wbTimer.Tick += new EventHandler(wbTimer_Tick);
    
             wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
    
             wb.FrameLoadEnd += new EventHandler<CefSharp.FrameLoadEndEventArgs>(wb_FrameLoadEnd);
    
        }
    
            void wb_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
            {
                if (wbTimer.IsEnabled)
                    wbTimer.Stop();
    
                wbTimer.Start();
            }
    
        void wbTimer_Tick(object sender, EventArgs e)
        {
            wbTimer.Stop();
            string html = GetHTMLFromWebBrowser();
        }
    
        private string GetHTMLFromWebBrowser()
        {
             // call the ViewSource method which will open up notepad and display the html.
             // this is just so I can compare it to the html returned in GetSourceAsync()
             // This is displaying all the html code (including child frames)
                wb.GetBrowser().MainFrame.ViewSource();
    
             // Get the html source code from the main Frame.
                // This is displaying only code in the main frame and not any child frames of it.
                Task<String> taskHtml = wb.GetBrowser().MainFrame.GetSourceAsync();
    
                string response = taskHtml.Result;
         return response;
      }
    
    }
    
    • Szabolcs Dézsi
      Szabolcs Dézsi about 8 years
      Can you share some more code? I can't reproduce your problem, I get the same text with GetSourceAsync as with ViewSource. Tried it with Address set to http://stackoverflow.com (it has two frames, one iframe and the main frame)
    • Scott
      Scott about 8 years
      Thanks for taking a look. I have added example source to the original post.
  • Scott
    Scott about 8 years
    Thank you for the feedback on my code, I have sine updated it to reflect your example. I have run the code on another computer since posting the example and I get the same results as you (both return the full source code). I can only conclude there is something weird going on with my machine and I will consider doing a format.