Ruby Mechanize get elements with specified text
Solution 1
page.search(".details").at("span:contains('title 3')").parent.text
Explanation: With at you can use css or xpath selector. In order to make more readable and similar to your approach, this answer use css selector, but the problem is that CSS cannot perform selection based on text. Thanks to Nokogiri, you can use use JQuery selector, so the contains methods is allow.
The selection get the span element, so if you want to get the li element parent, you can use parent methods and then get the text easily.
Solution 2
Since you're looking to do this using Mechanize (and I see one of the comments recommend using Nokogiri instead) you should be aware that Mechanize is built on Nokogiri, so you're actually able to use any/all Nokogiri functionality through Mechanize.
To show you from the docs at http://mechanize.rubyforge.org/Mechanize.html
Mechanize.html_parser = Nokogiri::XML
So you can accomplish this using XPath and the mechanize page.search method.
page.search("//div[@class='details']/ul/li[span='title 3']").text
This should be able to give you the text for the li element that you're looking for. (unverified with .text, but the XPath does work)
You can test the XPath here: http://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2
Solution 3
A cleaner css
approach:
page.at('.details li:has(span[text()="title 3"])')
George Karanikas
Full stack developer, mostly working with Java and Angular. Always open to new challenges!
Updated on July 28, 2022Comments
-
George Karanikas almost 2 years
I am trying to parse the contents of a website using mechanize and I am stuck at a point. The content that I want to parse is inside a
li
tag and is not always in the same order.Lets suppose that we have the following where the order of
li
tags is not always the same and some times some may not even be there at all.<div class="details"> <ul> <li><span>title 1</span> ": here are the details"</li> <li><span>title 2</span> ": here are the details"</li> <li><span>title 3</span> ": here are the details"</li> <li><span>title 4</span> ": here are the details"</li> </ul> </div>
What I want is to get only the
li
details where thespan
text is for exampletitle 3
. What I have done is the following which gives me the details from the firstli
:puts page.at('.details').at('span', :text => "title 3").at("+ *").text
Is there a way to do what I want using mechanize or should I also use other means?
-
Arup Rakshit over 10 yearsIf you want you can do this using nokogiri.. If you want,I can try for you..
-
George Karanikas over 10 yearsMechanize as I have read and seen is working with nokogiri. The
page
that you see can be treated as a nokogiri object in which you can search. Instead of for examplecss_at
you write justat
-
Arup Rakshit over 10 yearsHumm I just found that.. What's the problem with your code? If I get to know that,I can probably think my one.. :)
-
George Karanikas over 10 yearsAs I said the problem is that it gives me the first
li
whereas I want the one that has as<span>
texttitle 3
-
-
George Karanikas over 10 yearsIt worked like that. The only think that I changed is that I used
next
instead ofparent
to get the details part. Thanks! -
pguardiario over 10 yearsIt's not a jQuery selector, it's a css selector.