HIVE UDF (User Defined Functions) – HIVE Standard, Aggregate Function

HIVE UDF (User Defined Functions) allow the user to extend HIVE Query Language. Once the UDF is added in the HIVE script, it works like a normal built-in function. To check which all UDFs are loaded in current hive session, we use SHOW command.

To check which all HIVE UDF are loaded in the current HIVE session, we use SHOW command.

hive> SHOW FUNCTIONS;

array
abs
acos
and

The documentation about a function is shown by DESCRIBE FUNCTION <“function name”> command and their detailed documentation is accessed by DESCRIBE FUNCTION EXTENDED <“function name”> command.

hive> DESCRIBE FUNCTION concat;

concat(str1,str, ... ,strN) - returns the concatenation of strings.

Different HIVE UDF used in HIVE Query Language

Standard FUNCTION

These kinds of functions take a row argument or one or more columns from a row and return a single value.

Example: round(),floor(), concat(), reverse(),ucase(),abs() etc.

It is important to note that these functions can also return a complex object, such as an array, map, or struct.

Aggregate FUNCTION

It takes one or more columns from zero to many rows and returns a single value.

Example: sum(), avg(),min(),max() etc.

Table Generating FUNCTION

Table generation functions take zero or more inputs and produce multiple columns or rows as output.

hive> SELECT array(1,2,3,4) FROM dual;

1
2
3
4

Explode() is another table generation function which takes an array of input and iterates through the list and returns each element from the list in a separate row.

The lateral view is used in conjunction with user-defined table generating functions such as explode().

An example of explode() function is given below :

Column NameColumn Type
DeptbranchnoArray<int>
DnameSTRING

An example of data in the above tables with three rows:

DnameDeptbranchno
Zoology[1,2,3]
Mathematics[5,6]

A lateral view with explode() can be used to convert  “Deptbranchno”  into separate rows using the query:

Hive> SELECT Dname, Deptbranchno

FROM dept

LATERAL VIEW explode(Deptbranchno) as DeptbranchID;

The output of above HIVE Query will be:

Dname(string)DeptbranchID(int)
Zoology1
Zoology2
Zoology3
Mathematics5
Mathematics6

 

Custom Hive using UDF’s – Prerequisites, Step by Step example

We can extend and have custom Hive using the User Defined Functions (UDFs). To demonstrate the process let us look at the below example.

Prerequisites for Custom Hive

  • We have to extend “our class” to the UDF abstract class.
  • “Our Class” must have at leas one evaluate () method. evaluate () method is not the method of UDF abstract class. This evaluate() method should have at least one parameter.
  • Compile the above java file and create the JAR file for keeping the .class file.
  • Add JAR file to hive classpath.
  • Create temporary function.

Problem Statement

Find the maximum marks obtained out of four subjects by a student.

Step 1:

Create a table STUDENTS_RECORDS with below sample record.

SIDNAMECLASSMATHPHYSICSENGLISHCSCTOT_MARKS
1MAK1085958692358
2TANUL10858510095377
3JHON1092988290362

Step 2:

Create a new project say “hiveudf” in package explorer of the Eclipse IDE.

Add the required JAR files by going in Libraries tab of the eclipse.

Choose “Add External JARs…”. The main JAR file which we need to add is “hadoop-core.jar”.

Also, add hive related JAR files. All the hive related JAR files will be there in /hive/lib folder.

Step 3:

Create a class for the “hiveudf” by right clicking on the project.

Package Name: “com.hadoop.hive”

Class Name: “GetMaxMarks”.

Step 4:

Write code as below:

package com.hadoop.hive; 
import org.apache.hadoop.hive.ql.exec.UDF; 
public class GetMaxMarks extends UDF{ 

            public double evaluate (double math,double eng,double physics,double csc)

            {
                        double maxMarks=math;
                        if(eng>maxMarks)
                        {
                                    maxMarks = eng;
                        }

                        if(physics>maxMarks)
                        {
                                    maxMarks=physics;
                        }
                        if(csc>maxMarks)
                        {
                                    maxMarks=csc;
                        }                      
                        return maxMarks;
            }
}

Step 5:

Create the JAR file for the above class. Right-click on the project -> Export -> JAR -> Next -> Put the JAR file name as “hive-maxmarks.jar”

Step 6: 

Add JAR file to hive classpath.

hive> add jar /home/training/workspace/hive-maxmarks.jar

Step 7:

In order to apply business logic on top of hive column using our UDF, we need to create a temporary function for the exported jar file.

Hive> CREATE temporary function func_name as com.hadoop.hive.GetMaxMarks --'absolute_class_path_name'.

Step 8:

Apply this UDF/function on your table.

Hive> SELECT sid, name,GetMaxMarks(math, eng, physics,csc) from STUDENTS_RECORDS;