How to extract texts between <p> tags

30,509

This can do the job

Elements e=doc.select("p"); 

Here is a list of all selectors you can use.

Suppose you have this html:

String html="<p>some <strong>bold</strong> text</p>";

To get some bold text as result you should use:

Document doc = Jsoup.parse(html);
Element p= doc.select("p").first();
String text = doc.body().text(); //some bold text

or

String text = p.text(); //some bold text

Suppose now you have the following complex html

String html="<div id=someid><p>some text</p><span>some other text</span><p> another p tag</p></div>"

To get the values from the two p tags you have to do something like this

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag

You can find more info here also

Hope this helped

Share:
30,509
rena-c
Author by

rena-c

Updated on April 09, 2020

Comments

  • rena-c
    rena-c about 4 years

    I want to extract texts from HTML page(s) which placed in p and li tags, so I can start to tokenize the page to construct inverted index(es) for each page in order to answer search queries.

    How I can get p tags using jsoup

    Elements e = doc.select(""); 
    

    What could be the string to be written in that parameter?

  • rena-c
    rena-c almost 11 years
    Yeah i know it gets the p tags from the cookbook,but with complex structures like <p><br> or <p class... etc it doesnt work for them.For all html writing structure,it must produce same result.How can i do that?
  • QuangDT
    QuangDT over 7 years
    Note: when using select.("p").first(); it will return the second element if the first element is empty e.g. for <p> </p><p>test</p>, the function will return "test" rather than " ". I had to use getElementsByTag to work around it.