Why getChild() method of JDOM returns null?

java html xml jdom

13,744

Solution 1

I've found some problems in your code: 1) if you want to build a remote xml through the net, you should user another build method which receives an URL as input. Actually you're parsing the file with name "www......com" as an xml.

Document jdomDocument = builder.build( new URL("http://www........com"));

2) if you want to parse an html page as xml, you have to check that it is a well formed xhtml document, otherwise you can't parse it as xml

3) as I've already said you in another answer, the root.getChild("body") returns root's child which name is "body", without namespace. You should check the namespace for the element that you're looking for; if it has a qualified namespace you have to pass it in this way:

root.getChild("body", Namespace.getNamespace("your_namespace_uri"));

To know which namespace has your element in an easy way, you should print out all root's children using getChildren method:

for (Object element : doc.getRootElement().getChildren()) {
    System.out.println(element.toString());
}

If you're trying to parse an xhtml, probably you have namespace uri http://www.w3.org/1999/xhtml. So you should do this:

root.getChild("body", Namespace.getNamespace("http://www.w3.org/1999/xhtml"));

Solution 2

What makes you feel like you require org.ccil.cowan.tagsoup.Parser? What does it provide you that the parser built into the JDK does not?

I'd try it using another constructor for SAXBuilder. Use the parser built into the JDK and see if that helps.

Start by printing out the entire tree using XMLOutputter.

public static void getBody() 
{
    SAXBuilder builder = new SAXBuilder(true);
    Document document = builder.build("http://www......com");
    XMLOutputter outputter = new XMLOutputter();
    outputter.output(document, System.out);  // do something w/ exception
}

Solution 3

import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
      //It returns null
System.out.println(root.getChild("body", Namespace.getNamespace("my_name_space")));
}

13,744

Author by

Arun

Updated on June 30, 2022

Comments

Arun almost 2 years
I'm doing a project regarding html document manipulation. I want body content from existing html document to modify it into a new html.Now i'm using JDOM. i want to use body element in my coding.For that i used getChild("body") in my coding.But it returns null to my program.But my html document have a body element.Could anybody help me to know this problem as i'm a student?

would appreciate pointers..

Coding:
```
import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
      //It returns null
System.out.println(root.getChild("body"));
}
```
please refer these too.. My html's root and childs printed in console...
```
root.getName():html

SIZE:2

[Element: <head [Namespace: http://www.w3.org/1999/xhtml]/>]

[Element: <body [Namespace: http://www.w3.org/1999/xhtml]/>]
```