get div nested in div element using Nokogiri

11,075

I don't know xpaths, I prefer to use css selectors, they make more sense to me. This tutorial might be useful for you.

require 'rubygems'
require 'nokogiri'
require 'pp'

Event = Struct.new :name , :link , :date

doc = Nokogiri::HTML DATA

events = doc.css("div.nof.clearfix").map do |eventnode|
  name = eventnode.at_css("h2 a").text.strip
  link = eventnode.at_css("h2 a")['href']
  date = eventnode.at_css("div.pl.intro").text.strip
  Event.new name , link , date
end

pp events


__END__
<div class="nof clearfix">        
         <h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2">    </span></h2>
           <div class="pl intro">
             Date: 25th,11,2010<br/>
           </div>
</div>
<div class="nof clearfix">        
         <h2><a href="http://www.douban.com/event/12761581/">folk concert </a> <span class="pl2">    </span></h2>
           <div class="pl intro">
             Date: 10th,11,2010<br/>
           </div>
</div>

This outputs:

[#<struct Event
  name="folk concert 2",
  link="http://www.douban.com/event/12761580/",
  date="Date: 25th,11,2010">,
 #<struct Event
  name="folk concert",
  link="http://www.douban.com/event/12761581/",
  date="Date: 10th,11,2010">]
Share:
11,075
pierrotlefou
Author by

pierrotlefou

Software Developer

Updated on June 18, 2022

Comments

  • pierrotlefou
    pierrotlefou about 2 years

    For following HTML, I want to parse it and get following result using Nokogiri.

    event_name = "folk concert 2"   
    event_link = "http://www.douban.com/event/12761580/"    
    event_date = "20th,11,2010"
    

    I know doc.xpath('//div[@class="nof clearfix"]') could get each div element, but how should I proceed to get each attribution like event_name, and especially the date?

    HTML

     <div class="nof clearfix">        
              <h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2">    </span></h2>
                <div class="pl intro">
                  Date:25th,11,2010<br/>
                </div>
     </div>
     <div class="nof clearfix">        
              <h2><a href="http://www.douban.com/event/12761581/">folk concert </a> <span class="pl2">    </span></h2>
                <div class="pl intro">
                  Date:10th,11,2010<br/>
                </div>
     </div>