Write pandas table to impala
Solution 1
You're going to love Ibis! It has the HDFS functions (put
, namely) and wraps the Impala DML and DDL you'll need to make this easy.
The general approach I've used for something similar is to save your pandas table to a CSV, HDFS.put
that on to the cluster, and then create a new table using that CSV as the data source.
You don't need Ibis for this, but it should make it a little bit easier and may be a nice tool for you if you're already familiar with pandas (Ibis was also created by Wes, who wrote pandas).
Solution 2
I am trying to do same thing and I figured out a way to do this with an example provided with impyla
:
df = pd.DataFrame(np.reshape(range(16), (4, 4)), columns=['a', 'b', 'c', 'd'])
df.to_sql(name=”test_df”, con=conn, flavor=”mysql”)
This works fine and table in impala (backend mysql) works fine.
However, I got stuck on getting text values in as impala tries to do analysis on columns and I get cast errors. (It would be really nice if possible to implicitly cast from string to [var]char(N) in impyla.)
SummerEla
Bioinformatic Scientist (data science for science science) Nerd, in progress Enjoys long walks to the pub on moonlit nights... on any night, really
Updated on July 20, 2022Comments
-
SummerEla almost 2 years
Using the impyla module, I've downloaded the results of an impala query into a pandas dataframe, done analysis, and would now like to write the results back to a table on impala, or at least to an hdfs file.
However, I cannot find any information on how to do this, or even how to ssh into the impala shell and write the table from there.
What I'd like to do:
from impala.dbapi import connect from impala.util import as_pandas # connect to my host and port conn=connect(host='myhost', port=111) # create query to save table as pandas df create_query = """ SELECT * FROM {} """.format(my_table_name) # run query on impala cur = conn.cursor() cur.execute(create_query) # store results as pandas data frame pandas_df = as_pandas(cur) cur.close()
Once I've done whatever I need to do with pandas_df, save those results back to impala as a table.
# create query to save new_df back to impala save_query = """ CREATE TABLE new_table AS SELECT * FROM pandas_df """ # run query on impala cur = conn.cursor() cur.execute(save_query) cur.close()
The above scenario would be ideal, but I'd be happy if I could figure out how to ssh into impala-shell and do this from python, or even just save the table to hdfs. I'm writing this as a script for other users, so it's essential to have this all done within the script. Thanks so much!
-
SummerEla over 8 yearsI was actually just reading up on Ibis the other day. Sounds like I'll have to check it out.
-
elvainch about 6 yearsHi im getting an error in Ibis in the db.create_table('mytablename',mydataframe). The error is File "C:\Anaconda3\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0). Any idea what could be? @Isaac