Converting MS word "curly" quotes and apostrophes
24,062
Solution 1
Going off Thomas's answer, the code is:
return text.replaceAll("[\\u2018\\u2019]", "'")
.replaceAll("[\\u201C\\u201D]", "\"");
Solution 2
Here's a very useful link for everyone dealing with Unicode: Unicode codepoint lookup/search tool.
Searching for "quotation mark" gives
‘ (U+2018) LEFT SINGLE QUOTATION MARK
’ (U+2019) RIGHT SINGLE QUOTATION MARK
“ (U+201C) LEFT DOUBLE QUOTATION MARK
” (U+201D) RIGHT DOUBLE QUOTATION MARK
There are several other quote-like symbols that you might consider replacing.
Solution 3
Thank to Nick van Esch at C# How to replace Microsoft's Smart Quotes with straight quotation marks?
Here is the code ('\u2019' is ’ in MS Word), it's useful because it covers problematic word characters.
if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
Author by
user340188
Updated on July 09, 2022Comments
-
user340188 almost 2 years
How do I convert the MS Word quotes and apostrophes to regular quotes and apostrophes characters in Java? What's the unicode number for these characters?
“how are you doing?”
‘howdy’
Since Stack Overflow autofixes them, here's how they appear in an editor
to
"how are you doing?"
'howdy'
-
Anish Mittal almost 7 yearsIn the above answer, we are mentioning all MS Word quotes. Is there not any simple code which replaces all MS Word Quotes with Straight quotation marks? I mean, how can we list all the MS Word quotes?
-
123iamking over 6 years@Anish Mittal: As far as I know, this is the simplest way.