Need Pure/jQuery Javascript Solution For Cleaning Word HTML From Text Area

10,029

Solution 1

I am looking at David Archer's answer and he pretty much answers it. I have used in the past a solution similar to his:

$("textarea").change( function() {
    // convert any opening and closing braces to their HTML encoded equivalent.
    var strClean = $(this).val().replace(/</gi, '&lt;').replace(/>/gi, '&gt;');

    // Remove any double and single quotation marks.
    strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');

    // put the data back in.
    $(this).val(strClean);
});

If you are looking for a way to completely REMOVE HTML tags

$("textarea").change( function() {
    // Completely strips tags.  Taken from Prototype library.
    var strClean = $(this).val().replace(/<\/?[^>]+>/gi, '');

    // Remove any double and single quotation marks.
    strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');

    // put the data back in.
    $(this).val(strClean);
});

Solution 2

You could check out Word HTML Cleaner by Connor McKay. It is a pretty strong cleaner, in that it removes a lot of stuff that you might want to keep, but if that's not a problem it looks pretty decent.

Solution 3

It might be useful to use the blur event which would be triggered less often:

$("textarea").blur(function() {
    // check input ($(this).val()) for validity here
});

Solution 4

What about something like this:

function cleanHTML(pastedString) {
    var cleanString = "";
    var insideTag = false;
    for (var i = 0, var len = pastedString.length; i < len; i++) {
        if (pastedString.charAt(i) == "<") insideTag = true;
        if (pastedString.charAt(i) == ">") {
            if (pastedString.charAt(i+1) != "<") {
                insideTag = false;
                i++;
            }
        }
        if (!insideTag) cleanString += pastedString.charAt(i);
    }
    return cleanString;
}

Then just use the event listener to call this function and pass in the pasted string.

Share:
10,029
Alex Racho
Author by

Alex Racho

Just your average UI designer with a lust for jQuery and various other code ventures.

Updated on June 05, 2022

Comments

  • Alex Racho
    Alex Racho almost 2 years

    I know this issue has been touched on here but I have not found a viable solution for my situation yet, so I'd like to but the brain trust back to work and see what can be done.

    I have a textarea in a form that needs to detect when something is pasted into it, and clean out any hidden HTML & quotation marks. The content of this form is getting emailed to a 3rd party system which is particularly bitchy, so sometimes even encoding it to the html entity characters isn't going to be a safe bet.

    I unfortunately cannot use something like FCKEditor, TinyMCE, etc, it's gotta stay a regular textarea in this instance. I have attempted to dissect FCKEditor's paste from word function but have not had luck tracking it down.

    I am however able to use the jQuery library if need be, but haven't found a jQuery plugin for this just yet.

    I am specifically looking for information geared towards cleaning the information pasted in, not how to monitor the element for change of content.

    Any constructive help would be greatly appreciated.