web scraping with vba using XMLHTTP
I can confirm that I get the same HTML as you when I run your code (with or without the url tags). I found a useful post here. I have modified your code using the method found there and it now appears to have downloaded the correct information.
Sub test()
Call FuturesScrap1("http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures")
End Sub
I included the calling sub because the url tags appeared to cause an error for the MSXML request.
Sub FuturesScrap1(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim oHttp As MSXML2.XMLHTTP
Dim sHTML As String
Dim AnchorLinks As Object
Dim TDelements As Object
Dim TDelement As Object
Dim AnchorLink As Object
On Error Resume Next
Set oHttp = New MSXML2.XMLHTTP
If Err.Number <> 0 Then
Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
MsgBox "Error 0 has occured while creating a MSXML.XMLHTTPRequest object"
End If
On Error GoTo 0
If oHttp Is Nothing Then
MsgBox "For some reason I wasn't able to make a MSXML2.XMLHTTP object"
Exit Sub
End If
'Open the URL in browser object
oHttp.Open "GET", URL, False
oHttp.send
sHTML = oHttp.responseText
Debug.Print oHttp.responseText
HTMLDoc.body.innerHTML = oHttp.responseText
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set TDelements = .getElementsByTagName("td")
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next
For Each TDelement In TDelements
Debug.Print TDelement.innerText
Next
End With
End Sub
Edit folowing comment:
I haven't been able to find the table elements using MSXML2 object, the source code doesn't appear to contain them. In firebug the td tags are present so I thik that the table is generated by the JavaScript code. I don't know if MSXML2 can run the JavaScript so I've modified the sub to use internet explorer, it's not quick code, but it does find the td elements and does allow clicking the tabs. I have found that the td elements can take some time to become available (presumably for IE has to run the JavaScript) so I have put in a couple of steps where xl waits before downloading the data.
I have put in some code that will download the contents of the td elements into the active worksheet, be careful if running it in a workbook with useful data in it.
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement
End sub
Figlio
Updated on July 09, 2022Comments
-
Figlio almost 2 years
I would like to get some data from web page http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures.
If I'm using the old InternetExplorer object (code below), I could walking through HTML document. But I would like to use
XMLHTTP
object (second code).Sub IEZagon() 'we define the essential variables Dim ie As Object Dim TDelement, TDelements Dim AnhorLink, AnhorLinks 'add the "Microsoft Internet Controls" reference in your VBA Project indirectly Set ie = CreateObject("InternetExplorer.Application") With ie .Visible = True .navigate ("[URL]http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures[/URL]") While ie.ReadyState <> 4 DoEvents Wend Set AnhorLinks = .document.getElementsbytagname("a") Set TDelements = .document.getElementsbytagname("td") For Each AnhorLink In AnhorLinks Debug.Print AnhorLink.innertext Next For Each TDelement In TDelements Debug.Print TDelement.innertext Next End With Set ie = Nothing End Sub
Using code with XMLHTTP object:
Sub FuturesScrap(ByVal URL As String) Dim XMLHttpRequest As XMLHTTP Dim HTMLDoc As New HTMLDocument Set XMLHttpRequest = New MSXML2.XMLHTTP XMLHttpRequest.Open "GET", URL, False XMLHttpRequest.send While XMLHttpRequest.readyState <> 4 DoEvents Wend Debug.Print XMLHttpRequest.responseText HTMLDoc.body.innerHTML = XMLHttpRequest.responseText With HTMLDoc.body Set AnchorLinks = .getElementsByTagName("a") Set TDelements = .getElementsByTagName("td") For Each AnchorLink In AnchorLinks Debug.Print AnhorLink.innerText Next For Each TDelement In TDelements Debug.Print TDelement.innerText Next End With End Sub
I get only basic HTML:
<html> <head> <title>Resource Not found</title> <link rel= 'stylesheet' type='text/css' href='/blueprint/css/errorpage.css'/> </head> <body> <table class="header"> <tr> <td class="CMTitle CMHFill"><span class="large">Resource Not found</span></td> </tr> </table> <div class="body"> <p style="font-weight:bold;">The requested resource does Not exist.</p> </div> <table class="footer"> <tr> <td class="CMHFill"> </td> </tr> </table> </body> </html>
I would like to walking through tables and coresponding data... And finally I would like to select diferent time interval from Year to Month:
I'd really appreciate any help! Thank you!
-
Figlio about 10 yearsI made same code last saturday. But I have still problem on this web page. With your and my code I cant list 6 buttons (anchors) with name Year throught Day. If I want to walking throught diferent tables based on time window (year, quarter etc), I need to click on any of this Buttons. But this is not the last problem, in our code we can't list tables data with code: [code] For Each TDelement In TDelements Debug.Print TDelement.innerText Next [\code]
-
Graham Anderson about 10 years@Figlio I have modified the answer to get the TD elements and to allow for changing the table, it uses interenet explorer though, rather than MSXML2, this may be necesary due to JavaScript.
-
Figlio about 10 yearsThanks. With IE object works. I know, I made same code us you made. And I have same problem that need Application.wait metod. If so and do not go with XMLHTTP, I will stay on IE. Thanks again!