Replace HTML entities (e.g. ’) with character equivalents when parsing an XML feed

14,066

Solution 1

There are many libraries you can include in Titanium (Underscore.string, string.js that will make this happen, but if you only want the unescape html function, just try this code, adapted from the above libraries

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.

Solution 2

I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};
Share:
14,066

Related videos on Youtube

user2363025
Author by

user2363025

Updated on June 04, 2022

Comments

  • user2363025
    user2363025 almost 2 years

    When parsing an XML feed, I am getting text from the content tag, like this:

    The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent accomodation. The latest grant will allow for major refurbishment to a section of the school to allow for new accommodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

    Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

    EDIT:

    Ti.API.info("is this real------------"+win.dataToPass)
    


    returns: (line breaks added for clarity)

    [INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
    warning home owners and car owners in the town to be vigilant following a recent
    spate of break-ins. There has been a number of thefts from gardens and vehicles
    in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
    said that residents have reported seeing a dark haired male in and around the
    area in the early hours of the morning. Local Cllr Karina Carlin has been
    monitoring the situation &#8211; she says the problem seems to be getting
    worse&#8230;&#8230;.
    


    My external.js file is below i.e. the one which merely displays the text above:

    var win= Titanium.UI.currentWindow;
    
    Ti.API.info("Is this real------------------"+ win.dataToPass);
    
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    
    function unescapeHTML(str) {//modified from underscore.string and string.js
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
            var match;
    
            if ( entityCode in escapeChars) {
                return escapeChars[entityCode];
            } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
                return String.fromCharCode(parseInt(match[1], 16));
            } else if ( match = entityCode.match(/^#(\d+)$/)) {
                return String.fromCharCode(~~match[1]);
            } else {
                return entity;
            }
        });
    }
    
    var newText= unescapeHTML(win.datatoPass);
    
    
    var label= Titanium.UI.createLabel({
        color: "black",
        //text: win.dataToPass,//this works!
        text:newText,//this is causing an error
        font: "Helvetica",
        fontSize: 50,
        width: "auto",
        height: "auto",
        textAlign: "center"
    })
    
    win.add(label);
    
  • Joshua Briefman
    Joshua Briefman almost 11 years
    @user2363025 The programming language your using may support a library containing a search and replace routine that performs the function I described, this will depend on the language your using and what libraries you may or may not have present on the machine.
  • user2363025
    user2363025 almost 11 years
    I'm using titanium. Any idea if they have a library like this
  • Joshua Briefman
    Joshua Briefman almost 11 years
    Two options exist. 1) Build a custom function that searches for and repalced each of the codes. 2) It looks like Titanium is derivative of Java, if that is true then the following should work (although you may have to reference the java standard lib, visit the titanium page for a howto on creating that reference: docs.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
  • user2363025
    user2363025 almost 11 years
    Thanks very much for your help. I've copied this code in. In order to actually run it, I have tried: var newText= unescape(win.datatoPass); where win.datatoPass is a string. The variable newText was what I then set as the text propert of a label. But it is displaying in my app as undefined
  • user2363025
    user2363025 almost 11 years
    I've realised, it should have been unescapeHTML(win.datatoPass) I now have tried: var newText= unescapeHTML(win.datatoPass); where win.datatoPass is a string. The variable newText was what I then set as the text property of a label. But my app is saying it cannot call method 'replace'of undefined, the source is 'return str.replace(/\&([^;]+);/g, function(entity, entityCode)'. Do I have to slot in values for entity and entity code?
  • Josiah Hester
    Josiah Hester almost 11 years
    That means that win.datatoPass is undefined, check win.datatoPass there isn't a problem with the function itself.
  • user2363025
    user2363025 almost 11 years
    win.datToPass is definitely not undefined. It was working before I introduced this function and is now if I remove it. As i wrote up above, the first issue was because i wrote unescape(win.dataToPass) and NOT the correct function name i.e. unescapeHTML(win.dataToPass). It's definitely related to this line: return str.replace(/\&([^;]+);/g, function(entity, entityCode)' Do I have to replace entity and entityCode with actual values? Your help would be really appreciated
  • Josiah Hester
    Josiah Hester almost 11 years
    NO. Copy the function including the escapeChars and it will work As Is... IF you are passing it a String. The replace function takes a RegularExpression and a function as input, so don't mess with the function itself. In fact, this function was written by the Underscore guys and has been vetted a ton. Tell me what Ti.API.info('Is this real: '+win.datatoPass); prints out if you place it right before where you would call the unescapeHTML function.
  • user2363025
    user2363025 almost 11 years
    I'll edit my question above to show you how I used the function because I think I may be doing that incorrectly.
  • user2363025
    user2363025 almost 11 years
    Also note I've included the console you requested
  • Josiah Hester
    Josiah Hester almost 11 years
    You forgot to capitalize the t in dataToPass: check your code... var newText= unescapeHTML(win.datatoPass); It should be var newText= unescapeHTML(win.dataToPass);
  • user2363025
    user2363025 almost 11 years
    Wow what a stupid mistake not to spot! Thanks a lot for that. Is it easy to just add to the escapeChars array? That couldn't be them all or could it?