how to add a permanent function in hive?

19,445

Solution 1

As of 0.13.0 (HIVE-6047) you can make UDFs permanent through:

CREATE FUNCTION [db_name.]function_name AS class_name [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]

An example for your UDF:

CREATE FUNCTION getContentValue AS 'com.my.udf.GetContentValue' USING JAR '/home/taobao/oplog/hivescript/my_udf.jar';

For more info check Hive Data Definition Language, quoting it:

This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if ADD JAR/FILE had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location.

The function will be added to the database specified, or to the current database at the time that the function was created. The function can be referenced by fully qualifying the function name (db_name.funciton_name), or can be referenced without qualification if the function is in the current database.

Solution 2

If you only use hive from the hive-cli tool, then adding those two lines in .hiverc should do:

The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc and $HOME/.hiverc as initialization files.

If you use JDBC/ODBC to connect to hive, then you must do it in app code, eg. have a connection factory that first issues the two commands in the connection before giving out the connection for app to use.

Making a temporary UDF permanent (ie. store it in metastore, along with the JAR) is not possible at the moment. It is technically possible, but such a feature is just not there.

Solution 3

Write a udf and make its jar. Having a separate user for such udfs is good. Upload the jar to this user's hdfs location

hdfs dfs -put <hive-udf>.jar /user/udf/
hive> use udf;
hive> CREATE FUNCTION RemoveNewLine AS 'com.mycompany.hive.udf.RemoveNewLine' USING JAR 'hdfs://<defaultFS>/user/udf/<hive-udf>.jar';

Uses:

hive> select date, udf.RemoveNewLine(url) from abc.testTable limit 1;

Solution 4

1) Store the JAR file in hdfs location instead of local. If we have already created local function we might have to use a new function name enter image description here

2) hive> CREATE FUNCTION MASK AS 'hiveudf.PI' using JAR 'hdfs://localhost:8020/user/cloudera/Mask.jar'; in the hive shell enter image description here Note: When the function is created, it moves the JAR file from HDFS to local system

For more information you can visit http://www.prathapkudupublog.com/2018/01/permanent-user-defined-function-udf-in.html

Share:
19,445
Judking
Author by

Judking

Updated on June 07, 2022

Comments

  • Judking
    Judking almost 2 years

    Here is the problem: If I declare a temporary function in hive like this:

    add jar /home/taobao/oplog/hivescript/my_udf.jar;
    create temporary function getContentValue as 'com.my.udf.GetContentValue';
    

    It'll works fine with function getContentValue in this hive session. But what I want is not having to add jar as well as create temporary function every time I start a hive session. That is to say, make the function permanent.

    Is there any solutions to this problem?