Jsoup find element with specific text

29,208

Solution 1

When I run your code it selects the outer div, while I'm presuming what your looking for is the inner div. The documentation says that it selects the "elements that contains the specified text". In this simple html:

<div><div><b>Pantry/Catering</b></div></div>

The selector div:contains(Pantry/Catering) matches twice because both contain the text 'Pantry/Catering':

<!-- First Match -->
<div><div><b>Pantry/Catering</b></div></div>

<!-- Second Match -->
<div><b>Pantry/Catering</b></div>

The matches are always in that order because jsoup matches from the outside. Therefore .first() always matches the outer div. To extract the inner div you could use .get(1).

Extracting the inner div in full:

doc.select("div:contains(Pantry/Catering)").get(1)

Solution 2

This should also do the work for you:

doc.selectFirst("div:containsOwn(Pantry/Catering)").text();

Explanation:

selectFirst(selector) - Helps to avoid using select().first()

containsOwn(text) - A pseudo selector to return elements that directly contain the specified text. The text must appear in the found element, not any of its descendants in contrast with contains(text).

Source : https://jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-

Solution 3

Ok. Figured it out. Had to do something like

doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();

Thanks for the help!

Share:
29,208
tbag
Author by

tbag

Updated on July 09, 2022

Comments

  • tbag
    tbag almost 2 years

    I want to select an element with specific text from the HTML using JSoup. The html is

    <td style="vertical-align:bottom;text-align:center;width:15%">
    <div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span>
        <div><b>Pantry/Catering</b>
            <div>
                <div style="color:#00700B;">&#10003;&nbsp;Pantry Car Avbl
                    <br />&#10003;&nbsp;Catering Avbl</div>
            </div>
            <div>
                <div><span>Dinner is served after departure from NZM on 1st day.;</span>...
                    <br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a>
                </div>
                <div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &amp;Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee &amp; lime juice</div>
            </div>
        </div><span class="corners-bottom-subtab"><span></span></span>
    </div>
    

    I want to find the div element containing the text "Pantry/Catering". I tried

    doc.select("div:contains(Pantry/Catering)").first();
    

    But this doesnt seem to work. How can I get this element using Jsoup?

  • Kick Buttowski
    Kick Buttowski over 9 years
    I am trying to learn this but what happened if the div is not first inner div?
  • Spectre
    Spectre over 9 years
    @KickButtowski The order of the elements in the result is the same order that their opening tag appear in the text. Otherwise if you don't know it's position ahead of time you could iterate through the select results to find the element or match on something more specific (eg b:contains(Pantry/Catering)) and work back using .parent().
  • Kick Buttowski
    Kick Buttowski over 9 years
    thank you. why this code does not gimme what I want? doc = Jsoup.parse(input, null); Elements el = doc.select("div"); if (el.contains("Pantry/Catering")) { System.out.println(el.text()); }
  • Spectre
    Spectre over 9 years
    contains in Elements is from the Collection interface and is used for testing if it contains a specific Element, not for checking text contents. Also make sure you understand the difference between Element and Elements
  • tbag
    tbag over 9 years
    @Spectre - I dint quite get it. I tried doc.select("div:contains(Pantry/Catering)").get(1) and it still dint work for me. Btw, there is only one "Pantry/Catering" in the code I posted. So how is it matching twice for you?
  • Spectre
    Spectre over 9 years
    @tbag The :contains() pseudo selector matches elements that contain the specified text directly, or in any of it's descendants. In the simplest case of <a><b>text</b></a>, b contains 'text' directly. a also contains 'text' because it contains b, which contains 'text'.