Pandas read_sql DataTypes
How about
excel_df = pd.read_excel(...)
sql_df = pd.read_sql(...)
# attempt to cast all columns of excel_df to the types of sql_df
excel_df.astype(sql_df.dtypes.to_dict()).equals(sql_df)
MattR
I am a Data Analyst who uses Python to solve various data-centered problems. I also create powerful programs using Python. I use SQL daily along side SSIS and Tableau
Updated on June 04, 2022Comments
-
MattR almost 2 years
I have to compare two data sources to see if the same record is the same across all rows. One data source comes from an Excel File, where another comes from a SQL Table. I tried using
DataFrame.equals()
Like i have in the past.However, the issue is due to pesky datatype issues. Even though the data looks the same, the datatypes are making
excel_df.loc[excel_df['ID'] = 1].equals(sql_df.loc[sql_df['ID'] = 1])
returnFalse
. Here is an example of the datatype frompd.read_excel()
:COLUMN ID int64 ANOTHER Id float64 SOME Date datetime64[ns] Another Date datetime64[ns]
The same columns from
pd.read_sql
:COLUMN ID float64 ANOTHER Id float64 SOME Date object Another Date object
I could try using the
converters
argument frompd.read_excel()
to match SQL. Or also doingdf['Column_Name] = df['Column_Name].astype(dtype_here)
But I am dealing with a lot of columns. Is there an easier way to check for values across all columns?checking
pd.read_sql()
there is no thing likeconverters
but I'm looking for something like:df = pd.read_sql("Select * From Foo", con, dtypes = ({Column_name: str, Column_name2:int}))