Pandas read_sql DataTypes

10,390

How about

excel_df = pd.read_excel(...)
sql_df = pd.read_sql(...)

# attempt to cast all columns of excel_df to the types of sql_df
excel_df.astype(sql_df.dtypes.to_dict()).equals(sql_df)
Share:
10,390
MattR
Author by

MattR

I am a Data Analyst who uses Python to solve various data-centered problems. I also create powerful programs using Python. I use SQL daily along side SSIS and Tableau

Updated on June 04, 2022

Comments

  • MattR
    MattR almost 2 years

    I have to compare two data sources to see if the same record is the same across all rows. One data source comes from an Excel File, where another comes from a SQL Table. I tried using DataFrame.equals() Like i have in the past.

    However, the issue is due to pesky datatype issues. Even though the data looks the same, the datatypes are making excel_df.loc[excel_df['ID'] = 1].equals(sql_df.loc[sql_df['ID'] = 1]) return False. Here is an example of the datatype from pd.read_excel():

    COLUMN ID                         int64
    ANOTHER Id                      float64
    SOME Date                datetime64[ns]
    Another Date             datetime64[ns] 
    

    The same columns from pd.read_sql:

    COLUMN ID                        float64
    ANOTHER Id                       float64
    SOME Date                         object
    Another Date                      object
    

    I could try using the converters argument from pd.read_excel() to match SQL. Or also doing df['Column_Name] = df['Column_Name].astype(dtype_here) But I am dealing with a lot of columns. Is there an easier way to check for values across all columns?

    checking pd.read_sql() there is no thing like converters but I'm looking for something like:

    df = pd.read_sql("Select * From Foo", con, dtypes = ({Column_name: str,
                                                          Column_name2:int}))