Swift UTF8 encoding and non UTF8 character

15,762

I've found a solution.

The UTF8 take 8 bit of table ASCII, and the UTF16 take 16 bit ASCII table, the solution is simple by modifying my function to:

func stringToUTF16String (stringaDaConvertire stringa: String) -> String {
    let encodedData = stringa.dataUsingEncoding(NSUTF16StringEncoding)!
    let attributedOptions = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
    let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
    //println(attributedString.string)
    return attributedString.string
}
Share:
15,762
luca carboni
Author by

luca carboni

Updated on June 04, 2022

Comments

  • luca carboni
    luca carboni almost 2 years

    I've a some text from json file. In this text I've applied UTF8 encode but this encoder don't recognize a non standard character àèìòù and it's capital char, is there a method to purify my string?

    My function:

    func stringToUTF8String (stringaDaConvertire stringa: String) -> String {
        let encodedData = stringa.dataUsingEncoding(NSUTF8StringEncoding)!
        let attributedOptions = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
        let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
        //println(attributedString.string)
        return attributedString.string
    }
    
  • Yuming Cao
    Yuming Cao about 9 years
    Yes, this works, but I still don't know why dataUsingEncoding is not able to identify the character using UTF8StringEncoding. In my case, I verified my file is stored as UTF-8, so encodedData should contain the right content, my guess is that NSAttributedString uses UTF-16 encoding, after all that is the only encoding supported by NSString, the documentation is not clear about this though.
  • samwize
    samwize almost 8 years
    I was having the same problem and worked out it must be due to NSAttributedString. The documentation never specify what encoding the parameter data should have, but I think we have verified that it MUST be NSUTF16StringEncoding. Internally they probably decode with that.
  • ctietze
    ctietze over 6 years
    The foundational NSString is represented using UTF-16, so that default would make sense. That being said, you can specify options: [characterEncoding: NSUTF8StringEncoding] to match the incoming data.