Ruby Mechanize get elements with specified text

css ruby mechanize

17,590

Solution 1

page.search(".details").at("span:contains('title 3')").parent.text

Explanation: With at you can use css or xpath selector. In order to make more readable and similar to your approach, this answer use css selector, but the problem is that CSS cannot perform selection based on text. Thanks to Nokogiri, you can use use JQuery selector, so the contains methods is allow.

The selection get the span element, so if you want to get the li element parent, you can use parent methods and then get the text easily.

Solution 2

Since you're looking to do this using Mechanize (and I see one of the comments recommend using Nokogiri instead) you should be aware that Mechanize is built on Nokogiri, so you're actually able to use any/all Nokogiri functionality through Mechanize.

To show you from the docs at http://mechanize.rubyforge.org/Mechanize.html

Mechanize.html_parser = Nokogiri::XML

So you can accomplish this using XPath and the mechanize page.search method.

page.search("//div[@class='details']/ul/li[span='title 3']").text

This should be able to give you the text for the li element that you're looking for. (unverified with .text, but the XPath does work)

You can test the XPath here: http://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2

Solution 3

A cleaner css approach:

page.at('.details li:has(span[text()="title 3"])')

17,590

Author by

George Karanikas

Full stack developer, mostly working with Java and Angular. Always open to new challenges!

Updated on July 28, 2022

Comments

George Karanikas almost 2 years
I am trying to parse the contents of a website using mechanize and I am stuck at a point. The content that I want to parse is inside a li tag and is not always in the same order.

Lets suppose that we have the following where the order of li tags is not always the same and some times some may not even be there at all.
```
<div class="details">
  <ul>
    <li><span>title 1</span> ": here are the details"</li>
    <li><span>title 2</span> ": here are the details"</li>
    <li><span>title 3</span> ": here are the details"</li>
    <li><span>title 4</span> ": here are the details"</li>
  </ul>
</div>
```
What I want is to get only the li details where the span text is for example title 3. What I have done is the following which gives me the details from the first li:
```
puts page.at('.details').at('span', :text => "title 3").at("+ *").text
```
Is there a way to do what I want using mechanize or should I also use other means?
- Arup Rakshit over 10 years
  
  If you want you can do this using nokogiri.. If you want,I can try for you..
- George Karanikas over 10 years
  
  Mechanize as I have read and seen is working with nokogiri. The page that you see can be treated as a nokogiri object in which you can search. Instead of for example css_at you write just at
- Arup Rakshit over 10 years
  
  Humm I just found that.. What's the problem with your code? If I get to know that,I can probably think my one.. :)
- George Karanikas over 10 years
  
  As I said the problem is that it gives me the first li whereas I want the one that has as <span> text title 3
George Karanikas over 10 years

It worked like that. The only think that I changed is that I used next instead of parent to get the details part. Thanks!
pguardiario over 10 years

It's not a jQuery selector, it's a css selector.