Pandas convert dataframe to Utf-8

python pandas utf-8

21,669

It depends on how you're outputting the data. If you're simply using csv files, which you then import to KDB, then you can specify that easily:

df.to_csv('df_output.csv', encoding='utf-8')

Or, you can set the encoding when you import the data to Pandas originally, using the same syntax.

If you're connecting directly to KDB using SQLAlchemy or something similar, you should try specifying this in the connection itself - see this question: Another UnicodeEncodeError when using pandas method to_sql with MySQL

21,669

Author by

Chris Johnson

Updated on July 09, 2022

Comments

Chris Johnson 11 months

I have a df that consist of 100 rows and 24 columns. The column type is string. It's throwing me the following error when I tried to append the data frame to KDB

UnicodeEncodeError: 'ascii' codec can't encode character '\xd3' in position 9: ordinal not in range(128)

Here is an example of the first row in my df.

                        AnnouncementDate AuctionDate    BBT  \
_id
00000067   2012-12-11T00:00:00.000+00:00         NaN   FHLB
           CouponDividendRate DaysToSettle  \
_id
00000067                 0.61            1
                                        Description  \
_id
00000067                         FHLB 0.61 12/28/16
                     FirstSettlementDate           ISN IsAgency IsWhenIssued  \
_id
00000067   2012-12-28T00:00:00.000+00:00  US313381K796     True        False
           ...  OnTheRunTreasury OperationalIndicator  \
_id        ...
00000067   ...               NaN                False
          OriginalAmountOfPrincipal OriginalMaturityDate  \
_id
00000067                 13000000.0                  NaN
          PrincipalAmountOutstanding       SCSP       SMCP  \
_id
00000067                         0.0  313381K79   76000000
           SecurityTypeLevel1 SecurityTypeLevel2   TCK
_id
00000067          US-DOMESTIC                NaN   NaN

My question is, is there an easy way to convert my df to utf-8 format?

Possibly something like df = df.encode('utf-8')

Thanks