convert unicode string to nsstring

20,662

Solution 1

I assume that:

  • You are reading this RTF data from a file or other external source.
  • You are parsing it yourself (not using, say, AppKit's built-in RTF parser).
  • You have a reason why you're parsing it yourself, and that reason isn't “wait, AppKit has this built in?”.
  • You have come upon \u… in the input you're parsing and need to convert that to a character for further handling and/or inclusion in the output text.
  • You have ruled out \uc, which is a different thing (it specifies the number of non-Unicode bytes that follow the \u… sequence, if I understood the RTF spec correctly).

\u is followed by hexadecimal digits. You need to parse those to a number; that number is the Unicode code point number for the character the sequence represents. You then need to create an NSString containing that character.

If you're using NSScanner to parse the input, then (assuming you have already scanned past the \u itself) you can simply ask the scanner to scanHexInt:. Pass a pointer to an unsigned int variable.

If you're not using NSScanner, do whatever makes sense for however you're parsing it. For example, if you've converted the RTF data to a C string and are reading through it yourself, you'll want to use strtoul to parse the hex number. It'll interpret the number in whatever base you specify (in this case, 16) and then put the pointer to the next character wherever you want it.

Your unsigned int or unsigned long variable will then contain the Unicode code point value for the specified character. In the example from your question, that will be 0x10003, or U+10003.

Now, for most characters, you could simply assign that over to a unichar variable and create an NSString from that. That won't work here: unichars only go up to 0xFFFF, and this code point is higher than that (in technical terms, it's outside the Basic Multilingual Plane).

Fortunately, *CF*String has a function to help you:

unsigned int codePoint = /*…*/;

unichar characters[2];
NSUInteger numCharacters = 0;
if (CFStringGetSurrogatePairForLongCharacter(codePoint, characters)) {
    numCharacters = 2;
} else {
    characters[0] = codePoint;
    numCharacters = 1;
}

You can then use stringWithCharacters:length: to create an NSString from this array of 16-bit characters.

Solution 2

I have same for problem and the following code solve my issue

For Encode

NSData *dataenc = [yourtext dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodevalue = [[NSString alloc]initWithData:dataenc encoding:NSUTF8StringEncoding];

For decode

 NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
 NSString *decodevalue = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];

Thanks

Solution 3

I have used below code to convert a Uniode string to NSString. This should work fine.

    NSData *unicodedStringData =
    [unicodedString dataUsingEncoding:NSUTF8StringEncoding];
    NSString *emojiStringValue =
    [[NSString alloc] initWithData:unicodedStringData encoding:NSNonLossyASCIIStringEncoding];

In Swift 4

 let emoji = "😃"
let unicodedData = emoji.data(using: String.Encoding.utf8, allowLossyConversion: true)
let emojiString = String(data: unicodedData!, encoding: String.Encoding.utf8)

enter image description here

Solution 4

Use this:

NSString *myUnicodeString = @"\u10003"; 

Thanks to modern Objective C.
Let me know if its not what you want.

Share:
20,662
boom
Author by

boom

XML, C, C++, Cocoa, Gtk, Gtkmm, Gnome, zlib, libxml, Berkelium, OpenCV

Updated on July 09, 2022

Comments

  • boom
    boom almost 2 years

    I have a unicode string as

    {\rtf1\ansi\ansicpg1252\cocoartf1265
    {\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 LucidaGrande;}
    {\colortbl;\red255\green255\blue255;}
    {\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{check\}}{\leveltext\leveltemplateid1\'01\uc0\u10003 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
    {\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
    \paperw11900\paperh16840\margl1440\margr1440\vieww22880\viewh16200\viewkind0
    \pard\li720\fi-720\pardirnatural
    \ls1\ilvl0
    \f0\fs24 \cf0 {\listtext    
    \f1 \uc0\u10003 
    \f0     }One\
    {\listtext  
    \f1 \uc0\u10003 
    \f0     }Two\
    }
    

    Here i have unicode data \u10003 which is equivalent to "✓" characters. I have used [NSString stringWithCharacters:"\u10003" length:NSUTF16StringEncoding] which is throwing compilation error. Please let me know how to convert these unicode characters to "✓".

    Regards, Boom

  • boom
    boom over 10 years
    it is not 1003, it is 10003
  • Nico
    Nico over 10 years
    That won't work. \u requires a four-digit number. You'd need \U, which takes an eight-digit number. (You would, of course, have to pad it out with zeroes.) Moreover, the question sounds to me like the questioner is processing input, not (well, hopefully not) embedding a fixed RTF string in their source code.
  • Nico
    Nico over 10 years
    Given the entire RTF data, this returns nil. Given the \u10003 sequence alone, this returns two characters (U+1000 followed by a '3'), not one. gist.github.com/boredzo/8305377
  • Nico
    Nico over 10 years
    Given the entire RTF data, this returns nil. Given the \u10003 sequence alone, this returns two characters (U+1000 followed by a '3'), not one. stackoverflow.com/questions/20943928/…
  • Pawan Sharma
    Pawan Sharma over 10 years
    Can you pls share how are you encoding your rtf string. I used to encode my NSString that contained iOS Emojis characters to Unicode transmit over network and get the original NSString back when displaying inside my app. This trick was working fine for me
  • Fa.Shapouri
    Fa.Shapouri over 9 years
    I have a problem in a unicode string, your solution helps me to found problem, thank you
  • Admin
    Admin almost 9 years
    @zohar , this code simply changes 8bit unicode characters in to string value. Here i have used '\u2714' unicode that represents checkmark, simply i am changing that unicode into string value to represent actual checkmark sign in my code.