Pandas convert dataframe to Utf-8

21,669

It depends on how you're outputting the data. If you're simply using csv files, which you then import to KDB, then you can specify that easily:

df.to_csv('df_output.csv', encoding='utf-8')

Or, you can set the encoding when you import the data to Pandas originally, using the same syntax.

If you're connecting directly to KDB using SQLAlchemy or something similar, you should try specifying this in the connection itself - see this question: Another UnicodeEncodeError when using pandas method to_sql with MySQL

Share:
21,669
Chris Johnson
Author by

Chris Johnson

Updated on July 09, 2022

Comments

  • Chris Johnson
    Chris Johnson 11 months

    I have a df that consist of 100 rows and 24 columns. The column type is string. It's throwing me the following error when I tried to append the data frame to KDB

    UnicodeEncodeError: 'ascii' codec can't encode character '\xd3' in position 9: ordinal not in range(128)
    

    Here is an example of the first row in my df.

                            AnnouncementDate AuctionDate    BBT  \
    _id
    00000067   2012-12-11T00:00:00.000+00:00         NaN   FHLB
               CouponDividendRate DaysToSettle  \
    _id
    00000067                 0.61            1
                                            Description  \
    _id
    00000067                         FHLB 0.61 12/28/16
                         FirstSettlementDate           ISN IsAgency IsWhenIssued  \
    _id
    00000067   2012-12-28T00:00:00.000+00:00  US313381K796     True        False
               ...  OnTheRunTreasury OperationalIndicator  \
    _id        ...
    00000067   ...               NaN                False
              OriginalAmountOfPrincipal OriginalMaturityDate  \
    _id
    00000067                 13000000.0                  NaN
              PrincipalAmountOutstanding       SCSP       SMCP  \
    _id
    00000067                         0.0  313381K79   76000000
               SecurityTypeLevel1 SecurityTypeLevel2   TCK
    _id
    00000067          US-DOMESTIC                NaN   NaN
    

    My question is, is there an easy way to convert my df to utf-8 format?

    Possibly something like df = df.encode('utf-8')

    Thanks