Convert ASCII to plaintext in PHP

26,679

You can use html_entity_decode:

echo html_entity_decode('...', ENT_QUOTES, 'UTF-8');

Few notes:

  • Please note that it looks like you actually want to convert from HTML-encoded string(with entities like ) to ASCII AKA plaintext.

  • This example converts to UTF-8 which is ASCII-compatible character encoding for all ASCII characters (i.e. with char codes below 128). If you really want plain ASCII (thus loosing all accented characters and characters from foreign languages) you should strip all offending characters separately.

  • Last argument ('UTF-8') is necessary to keep compatibility with different PHP versions since the default value has changed since PHP 5.4.0.

Update: Example with your text in ideone.

Update2: Changed ENT_COMPAT to ENT_QUOTES by @Daan's suggestion.

Share:
26,679
e_r
Author by

e_r

Updated on February 10, 2020

Comments

  • e_r
    e_r about 4 years

    I am scraping some sites, and have ASCII text that I want to convert to plain text for storing in a DB. For example I want

    I have got to tell anyone who will listen that this is
    one of THE best adventure movies I've ever seen.
    It's almost impossible to convey how pumped I am
    now that I've seen it.
    

    converted to

    I have got to tell anyone who will listen that this is
    one of THE best adventure movies I've ever seen. It's
    almost impossible to convey how pumped I am now that
    I've seen it.
    

    I have googled my fingers bloody, any help?

  • e_r
    e_r almost 12 years
    Thanks for the input. I have actually tried using html_entity_decode but my output still has the ASCII equivalent of a quotation mark, e.g. ' Is this HTML-encoded? I do actually want to go from HTML-encoded strings to ASCII plaintext as I am doing some sentiment analysis on the results.
  • Daan
    Daan almost 12 years
    The example provided works for me with your input; are you sure you're passing the correct parameters to html_entity_decode?
  • e_r
    e_r almost 12 years
    @Daan yep, it works in the browser, but when I run the same code in CLI the problem persists.
  • Daan
    Daan almost 12 years
    Ahh, of course. You'll want to use ENT_QUOTES instead of ENT_COMPAT then. Not sure why this works correctly in ideone.
  • e_r
    e_r almost 12 years
    That did it! Throw up an answer and I will mark it. Thanks for the help.
  • Daan
    Daan almost 12 years
    You're welcome! And feel free to accept this answer; all I did was look at the code for a few seconds wondering why @ash108's code was not working, and then I realised  is a single quote, which is excluded by ENT_COMPAT. @ash108 did all the hard work, including putting up an example :)