Ruby Mechanize get elements with specified text

17,590

Solution 1

page.search(".details").at("span:contains('title 3')").parent.text

Explanation: With at you can use css or xpath selector. In order to make more readable and similar to your approach, this answer use css selector, but the problem is that CSS cannot perform selection based on text. Thanks to Nokogiri, you can use use JQuery selector, so the contains methods is allow.

The selection get the span element, so if you want to get the li element parent, you can use parent methods and then get the text easily.

Solution 2

Since you're looking to do this using Mechanize (and I see one of the comments recommend using Nokogiri instead) you should be aware that Mechanize is built on Nokogiri, so you're actually able to use any/all Nokogiri functionality through Mechanize.

To show you from the docs at http://mechanize.rubyforge.org/Mechanize.html

Mechanize.html_parser = Nokogiri::XML

So you can accomplish this using XPath and the mechanize page.search method.

page.search("//div[@class='details']/ul/li[span='title 3']").text

This should be able to give you the text for the li element that you're looking for. (unverified with .text, but the XPath does work)

You can test the XPath here: http://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2

Solution 3

A cleaner css approach:

page.at('.details li:has(span[text()="title 3"])')
Share:
17,590
George Karanikas
Author by

George Karanikas

Full stack developer, mostly working with Java and Angular. Always open to new challenges!

Updated on July 28, 2022

Comments

  • George Karanikas
    George Karanikas almost 2 years

    I am trying to parse the contents of a website using mechanize and I am stuck at a point. The content that I want to parse is inside a li tag and is not always in the same order.

    Lets suppose that we have the following where the order of li tags is not always the same and some times some may not even be there at all.

    <div class="details">
      <ul>
        <li><span>title 1</span> ": here are the details"</li>
        <li><span>title 2</span> ": here are the details"</li>
        <li><span>title 3</span> ": here are the details"</li>
        <li><span>title 4</span> ": here are the details"</li>
      </ul>
    </div>
    

    What I want is to get only the li details where the span text is for example title 3. What I have done is the following which gives me the details from the first li:

    puts page.at('.details').at('span', :text => "title 3").at("+ *").text
    

    Is there a way to do what I want using mechanize or should I also use other means?

    • Arup Rakshit
      Arup Rakshit over 10 years
      If you want you can do this using nokogiri.. If you want,I can try for you..
    • George Karanikas
      George Karanikas over 10 years
      Mechanize as I have read and seen is working with nokogiri. The page that you see can be treated as a nokogiri object in which you can search. Instead of for example css_at you write just at
    • Arup Rakshit
      Arup Rakshit over 10 years
      Humm I just found that.. What's the problem with your code? If I get to know that,I can probably think my one.. :)
    • George Karanikas
      George Karanikas over 10 years
      As I said the problem is that it gives me the first li whereas I want the one that has as <span> text title 3
  • George Karanikas
    George Karanikas over 10 years
    It worked like that. The only think that I changed is that I used next instead of parent to get the details part. Thanks!
  • pguardiario
    pguardiario over 10 years
    It's not a jQuery selector, it's a css selector.