How to get the raw HTML of a node
Solution 1
Use node.to_s
, or just node
:
nodes = doc.search("//tr[@class='tablebX']")
nodes.each do |node|
puts node.to_s
puts '-'*40
end
With additional sanity-check HTML (yours, doubled, with a tr
of a different class in the middle) I get:
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------
Solution 2
You can add children.to_html
. Try to do that below:
doc = Nokogiri::HTML(html)
nodes = doc.search("//tr[@class='tablebX']")
nodes.each do |node|
node.children.to_html # or node.content
end
Solution 3
The correct method is .children
. It returns all the html inside the selected element.
So having this code:
<tr class="container">
<td>value</td>
</tr>
And using this process:
data = Nokogiri::HTML(html)
data.css("tr.container").children
Will return this html:
<td>value</td>
I guess my answer is too late but that's the exact codes you need.
Related videos on Youtube
Kyaw Siesein
Software Engineer, rails, ruby, django,web2py,puppet, python,linux
Updated on September 15, 2022Comments
-
Kyaw Siesein over 1 year
I am using Nokogiri to analyze some HTML, but, I don't know how get the raw HTML inside a node.
For example, given:
<tr class="tableX"> <td align="center"> <font size="2"><a href="javascript:open('9746')">9746</a></font> </td> <td align="center"> <font size="2">2012-06-26</font> </td> </tr>
When I use this XPath selector:
doc = Nokogiri::HTML(html) nodes = doc.search("//tr[@class='tablebX']") nodes.each do |node| node.text # or node.content end
The results from
node.text
andnode.content
are:9746 2012-06-26
I want to get all raw HTML inside the
tr
block, which, in this case, is:<td align="center"> <font size="2"><a href="javascript:open('9746')">9746</a></font> </td> <td align="center"> <font size="2">2012-06-26</font> </td>
What's the proper way to do that?
-
PJP almost 4 yearsNode's
to_html
will give you the original HTML.
-
-
PJP almost 4 yearsChildren doesn't return raw HTML, it only returns the NodeSet containing the children of the parent node. The OP wants the raw HTML.
Node#to_html
or its aliases do that.