Hive queries via Python client

11,950

Solution 1

If you build hive from source, the modules will be located here (relative to the hive-trunk directory):

./build/dist/lib/py

You should be able to access the modules if you include that path in your PYTHONPATH environment variable, or you add that path to your python path in your script with the sys module.

Also note that there is no longer a module named 'hive'. In the example code you linked 'hive' should be replaced with 'hive_service'.

Solution 2

Looks like the hive_utils package has what you're looking for. Looking at the pypi page, you can run queries in the following way:

query = """
    SELECT country, count(1) AS cnt
    FROM User
    GROUP BY country
"""
hive_client = hive_utils.HiveClient(
    server=config['HOST'],
    port=config['PORT'],
    db=config['NAME'],
)
for row in hive_client.execute(query):
    print '%s: %s' % (row['country'], row['cnt'])

Installing that should also install the needed thrift packages.

Share:
11,950
Justin
Author by

Justin

Updated on June 04, 2022

Comments

  • Justin
    Justin almost 2 years

    I have hive 0.8 installed on a hadoop cluster running in AWS EMR.

    I am trying to do some data QA, which involves running a hive query and fetching the results into python where some more logic is contained.

    Currently, this is achieved by sending a hive query as a jobflow step, dumping those results to local storage on the master node, SCP-ing those results to my local machine, and then loading the file with python and parsing the results. All in all, not a very fun process.

    Ideally, I would be able to do this in a fashion similar to:

    conn = hive.connect(ip, port, user, pw)
    cursor = conn.cursor()
    cursor.execute(query)
    rs = cursor.fetchall()
    

    It seems that this is supposedly possible. Hive says that it supports it here. There is also another SO question that looks like it's doing what I'd like to do.

    However, I'm having trouble finding documentation. In particular, I haven't been able to figure out where to obtain the packages used in these examples. It would be immensely helpful if anyone were able to provide detailed instructions as to how to get the python client working, but failing that, it would be helpful just to know where to obtain these packages.