Select deeply nested element

13,071

Assuming indentation denotes containment in your example, the following XPath will select the span element for you:

//div[@id='...']/div[3]/div[2]/div/div/span

Of course, if there are no other span elements beneath the id'ed div, you could jump right to it:

//div[@id='...']//span

Or if there are no other span elements in the entire document:

//span
Share:
13,071
grigy
Author by

grigy

Jack of all techs, master of some.

Updated on June 18, 2022

Comments

  • grigy
    grigy almost 2 years

    I'm reading Scrapy/XPath tutorials but this does not seem trivial and I can't find an example that would explain it.

    Given a markup like this how would you select the <span> element?

    <div id=”...”>
    	<div>
    	<div>
    	<div>
    		<div>
    		<div>
    			<div>
    				<div>
    					<span>

    If we generalize the problem it would be:

    • skip n divs in the div with id="..."
    • skip m divs in the div
    • ...
    • select the span element in the div
  • grigy
    grigy almost 9 years
    Very definitive answer. Thanks!
  • grigy
    grigy almost 9 years
    It helped but for some reason I can't extract the selector. If I log the content it prints a "square" symbol.
  • kjhughes
    kjhughes almost 9 years
    Hard to tell without seeing a complete example, but perhaps it's not showing the string value of the selected element as you're expecting. Does it help to explicitly select the text nodes of the span (by appending /text() to your XPath?
  • grigy
    grigy almost 9 years
    No. It returns an empty list. It returns a non-empty value only for //div[@id='...'], for all other nodes under it the selector returns an empty list.
  • kjhughes
    kjhughes almost 9 years
    If you provide a Minimal, Complete, and Verifiable Example (MCVE) that exhibits the problem, it should be easy to see what's going on and help. Thanks.
  • kjhughes
    kjhughes almost 9 years
    From what I'm seeing, the referenced page does not have any children under <div id="developer_blog_index" data-referrer="developer_blog_index"></div>, so of course you won't be able to select anything beneath there.
  • grigy
    grigy almost 9 years
    Strange, I can see the children (actually the whole content) under the div in the Chrome's element inspector. Maybe it loads it dynamically...