base64 encoding for utf-8 strings

12,325

Solution 1

Like @RRUZ said, EncodeString() expects you to specify a byte encoding that the input String will be converted to, and then those octets will be encoded to base64.

You are passing a UTF8String to EncodeString(), which takes a UnicodeString as input in XE5, so the RTL will convert the UTF8String data back to UTF-16, undoing your UTF8Encode() (which is deprecated, BTW). Since you are not specifying a byte encoding, Indy uses its default encoding, which is set to ASCII by default (configurable via the GIdDefaultTextEncoding variable in the IdGlobal unit).

That is why orange works (no data loss) but سلام fails (data loss).

You need to get rid of your UTF8String altogether, and let Indy handle the UTF-8 for you:

procedure TForm5.Button2Click(Sender: TObject);
begin
  m2.Text := TIdEncoderMIME.EncodeString(m1.Text, IndyTextEncoding_UTF8);
end;

DecodeString() has a similar parameter for specifying the byte encoding of the octets that have been base64 encoded. The input is first decoded to bytes, and then the bytes are converted to UnicodeString using the specified byte encoding, eg:

procedure TForm5.Button3Click(Sender: TObject);
begin
  m1.Text := TIdDecoderMIME.DecodeString(m2.Text, IndyTextEncoding_UTF8);
end;

Solution 2

You must call the EncodeString method passing a proper byte encoding class.

Try this

m2.Text := TIdEncoderMIME.EncodeString(UTF8, IndyUTF8Encoding);

(IndyUTF8Encoding is defined in the IdGlobalunit)

Solution 3

For RadStudio10 C++

#include <IdGlobal.hpp> String my_str = L"Շնորհակալություն"; String str = IdEncoderMIME1->EncodeString(my_str ,IndyTextEncoding_UTF8()); my_str = IdDecoderMIME1->DecodeString(str ,IndyTextEncoding_UTF8());

Share:
12,325
peiman F.
Author by

peiman F.

phper

Updated on June 20, 2022

Comments

  • peiman F.
    peiman F. almost 2 years

    i have rad studio xe5 i used indy EncodeString for encoding the input string...

    my code is like this:

    procedure TForm5.Button2Click(Sender: TObject);
    var
      UTF8: UTF8String;
    begin
    UTF8 := UTF8Encode(m1.Text);
    m2.Text := ind.EncodeString(UTF8);
    end;
    

    but the output is wrong for utf-8 inputs

    orange  --> b3Jhbmdl  [correct]
    book   --> Ym9vaw==   [correct]
    سلام  -->  Pz8/Pw==   [wrong]
    کتاب  --> Pz8/Pw==   [wrong]
    دلفی  --> Pz8/Pw==   [wrong]
    

    for utf-8 for all inputs it returned same out put!!! what is wrong with my code and how can i have a good result of base64 encoding with utf-8 strings

  • Remy Lebeau
    Remy Lebeau about 10 years
    DON'T assign the source String to a UTF8String first. Pass it as-is to EncodeString() and let Indy convert the String to UTF-8 internally before it then encodes the octets to base64: m2.Text := TIdEncoderMIME.EncodeString(m1.Text, IndyUTF8Encoding);. If you are using D2007 or earlier, EncodeString() take an additional parameter that lets you specify the Ansi encoding of the String so EncodeString() can convert it to UTF-8 correctly. Also FYI, IndyUTF8Encoding has been replaced with IndyTextEncoding_UTF8 in Indy 10.6.