XPath to Parse "SRC" from IMG tag?
Solution 1
You are so close to answering this yourself that I am somewhat reluctant to answer it for you. However, the following XPath should provide what you want (provided the source is XHTML, of course).
//img[@class='photo-large']/@src
For further tips, check out W3 Schools. They have excellent tutorials on such things and a great reference too.
Solution 2
Using Hpricot this works:
doc.at('//img[@class="photo-large"]')['src']
In case you have more than one image, the following gives an array:
doc.search('//img[@class="photo-large"]').map do |e| e['src'] end
However, Nokogiri is many times faster and it “can be used as a drop in replacement” for Hpricot.
Here the version for Nokogiri, in which this XPath for selecting attributes works:
doc.at('//img[@class="photo-large"]/@src').to_s
or for many images:
doc.search('//img[@class="photo-large"]/@src').to_a
kush
My work calls me a Senior Software Engineer I know Ruby, JS (vue.js, etc), Erlang/Elixir, HTML/CSS. I have built side projects in Rust, Clojure (my love), and Go. I've been using Linux and MacOS forever and know both really well (although some linux are weird). I'm technically a front-end dev but I've run my own web businesses where I did everything from design, marketing, PR, server, ets.
Updated on July 24, 2020Comments
-
kush almost 4 years
Right now I successfully grabbed the full element from an HTML page with this:
//img[@class='photo-large']
for example it would return this:
<img src="http://example.com/img.jpg" class='photo-large' />
But I only need the SRC url (http://example.com/img.jpg). Any help?