Android SAX parser not getting full text from between tags


Solution 1

As you can see, it's cutting everything off the url from the ampersand escape code and after.

From the documentation of the characters() method:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

When I write SAX parsers, I use a StringBuilder to append everything passed to characters():

public void characters (char ch[], int start, int length) {
    if (buf!=null) {
        for (int i=start; i<start+length; i++) {

Then in endElement(), I take the contents of the StringBuilder and do something with it. That way, if the parser calls characters() several times, I don't miss anything.

Solution 2

public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    // TODO Auto-generated method stub
    sb=new StringBuilder();

public void characters (char ch[], int start, int length) {
    if (sb!=null && iconflag == true) {
        for (int i=start; i<start+length; i++) {

public void endElement(String uri, String localName, String qName)
        throws SAXException {
    // TODO Auto-generated method stub

So I figured it out, the code above is the solution.

Author by


I'm currently working as a software developer for Macadamian Technologies. My current specialization is in Android mobile application development both at the application level and the OS level. We're HIRING!!!! Check out the opportunities to join our team, visit the careers section of our website for details -- or simply Ask Me!

Updated on June 11, 2022


  • brockoli
    brockoli about 2 years

    I've created my own DefaultHandler to parse rss feeds and for most feeds it's working fine, however, for ESPN, it is cutting off part of the article url due to the way ESPN formats it's urls. An example of a full article url from ESPN..;campaign=rss&amp;source=ESPNHeadlines

    The problem is for some reason the DefaultHandler characters method is only getting this from the tag that contains the above url.

    As you can see, it's cutting everything off the url from the ampersand escape code and after. How can I get the SAX parser to not cut my string off at this escape code? For ref. here is my characters method..

     public void characters(char ch[], int start, int length) {
      String chars = (new String(ch).substring(start, start + length));
      try {
       // If not in item, then title/link refers to feed
       if (!inItem) {
        if (inTitle)
         currentFeed.title = chars;
       } else {
        if (inLink)
         currentArticle.url = new URL(chars);
        if (inTitle)
         currentArticle.title = chars;
        if (inDescription)
         currentArticle.description = chars;
        if (inPubDate)
         currentArticle.pubDate = chars;
        if (inEnclosure) {
      } catch (MalformedURLException e) {
       Log.e("RSSReader", e.toString());

    Rob W.

  • brockoli
    brockoli about 14 years
    Ok, I didn't really take the time to fully understand how the parser was working. After reading your answer I went back and researched further to get a better understanding. Your suggestion was the problem of course, I've since updated my code to handle the char data properly. TY
  • Ankit
    Ankit almost 11 years
    @CommonsWare: do it miss some characters? I am facing it in my case.
  • Ankit
    Ankit almost 11 years
    I have <image>image1:title</image> in my xml and sometime I get full value and sometimes I got only "itle" or "Title". I have tried to print values but it has never printed "image1:" for partial values.
  • CommonsWare
    CommonsWare almost 11 years
    @Ankit: Please open a fresh StackOverflow question, show your input, your parsing code, and your results.
  • Ankit
    Ankit almost 11 years
    With you solution my problem got resolved even then I will post it as question for future readers.
  • Nemanja
    Nemanja over 10 years
    Thank you, your answers are always short, descriptive, provide actual reasoning behind the answer and of course on the spot!
  • KK_07k11A0585
    KK_07k11A0585 about 9 years
    @CommonsWare I am using SAX parser which contains the following text inside as tag as shown below <book id="1">Hi this book is selected for <ref id="23">IIFA</ref> award.</book> When I parse, and get the text from the tag book, I am getting the below content 'Hi this book is selected for IIFA award.' But I want this text 'Hi this book is selected for <ref id="23">IIFA</ref> award.' Why the <ref> is missing in the text, how to get that while parsing ?? Please let me know
  • CommonsWare
    CommonsWare about 9 years
    @KK_07k11A0585: That is a separate XML element. You are already getting it while parsing, in your startElement() and endElement() methods.
  • KK_07k11A0585
    KK_07k11A0585 about 9 years
    @CommonsWare Thanks, I have parsed that by adding that tag name in startElement and endElement(). But is there any other way to get the complete text inside the tag as plain text ?? In the above example, how can I get this text 'Hi this book is selected for <ref id="23">IIFA</ref>' as is from the tag book ??
  • CommonsWare
    CommonsWare about 9 years
    @KK_07k11A0585: You would have to reassemble that yourself, using string concatenation. This has nothing to do with Android specifically. If you have further questions in this area, ask a fresh Stack Overflow question, tagged java, where you explain your input and what you are trying to achieve.