Converting MS word "curly" quotes and apostrophes

24,062

Solution 1

Going off Thomas's answer, the code is:

return text.replaceAll("[\\u2018\\u2019]", "'")
           .replaceAll("[\\u201C\\u201D]", "\"");

Solution 2

Here's a very useful link for everyone dealing with Unicode: Unicode codepoint lookup/search tool.

Searching for "quotation mark" gives

‘ (U+2018) LEFT SINGLE QUOTATION MARK
’ (U+2019) RIGHT SINGLE QUOTATION MARK
“ (U+201C) LEFT DOUBLE QUOTATION MARK
” (U+201D) RIGHT DOUBLE QUOTATION MARK

There are several other quote-like symbols that you might consider replacing.

Solution 3

Thank to Nick van Esch at C# How to replace Microsoft's Smart Quotes with straight quotation marks?

Here is the code ('\u2019' is ’ in MS Word), it's useful because it covers problematic word characters.

if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
Share:
24,062
user340188
Author by

user340188

Updated on July 09, 2022

Comments

  • user340188
    user340188 almost 2 years

    How do I convert the MS Word quotes and apostrophes to regular quotes and apostrophes characters in Java? What's the unicode number for these characters?

    “how are you doing?”

    ‘howdy’

    Since Stack Overflow autofixes them, here's how they appear in an editor

    Curly Quotes

    to

    "how are you doing?"

    'howdy'

  • Anish Mittal
    Anish Mittal almost 7 years
    In the above answer, we are mentioning all MS Word quotes. Is there not any simple code which replaces all MS Word Quotes with Straight quotation marks? I mean, how can we list all the MS Word quotes?
  • 123iamking
    123iamking over 6 years
    @Anish Mittal: As far as I know, this is the simplest way.