Custom Hive using UDF’s – Prerequisites, Step by Step example

We can extend and have custom Hive using the User Defined Functions (UDFs). To demonstrate the process let us look at the below example.

Prerequisites for Custom Hive

  • We have to extend “our class” to the UDF abstract class.
  • “Our Class” must have at leas one evaluate () method. evaluate () method is not the method of UDF abstract class. This evaluate() method should have at least one parameter.
  • Compile the above java file and create the JAR file for keeping the .class file.
  • Add JAR file to hive classpath.
  • Create temporary function.

Problem Statement

Find the maximum marks obtained out of four subjects by a student.

Step 1:

Create a table STUDENTS_RECORDS with below sample record.

SIDNAMECLASSMATHPHYSICSENGLISHCSCTOT_MARKS
1MAK1085958692358
2TANUL10858510095377
3JHON1092988290362

Step 2:

Create a new project say “hiveudf” in package explorer of the Eclipse IDE.

Add the required JAR files by going in Libraries tab of the eclipse.

Choose “Add External JARs…”. The main JAR file which we need to add is “hadoop-core.jar”.

Also, add hive related JAR files. All the hive related JAR files will be there in /hive/lib folder.

Step 3:

Create a class for the “hiveudf” by right clicking on the project.

Package Name: “com.hadoop.hive”

Class Name: “GetMaxMarks”.

Step 4:

Write code as below:

package com.hadoop.hive; 
import org.apache.hadoop.hive.ql.exec.UDF; 
public class GetMaxMarks extends UDF{ 

            public double evaluate (double math,double eng,double physics,double csc)

            {
                        double maxMarks=math;
                        if(eng>maxMarks)
                        {
                                    maxMarks = eng;
                        }

                        if(physics>maxMarks)
                        {
                                    maxMarks=physics;
                        }
                        if(csc>maxMarks)
                        {
                                    maxMarks=csc;
                        }                      
                        return maxMarks;
            }
}

Step 5:

Create the JAR file for the above class. Right-click on the project -> Export -> JAR -> Next -> Put the JAR file name as “hive-maxmarks.jar”

Step 6: 

Add JAR file to hive classpath.

hive> add jar /home/training/workspace/hive-maxmarks.jar

Step 7:

In order to apply business logic on top of hive column using our UDF, we need to create a temporary function for the exported jar file.

Hive> CREATE temporary function func_name as com.hadoop.hive.GetMaxMarks --'absolute_class_path_name'.

Step 8:

Apply this UDF/function on your table.

Hive> SELECT sid, name,GetMaxMarks(math, eng, physics,csc) from STUDENTS_RECORDS;

 

Big Data Application in Businesses – Using big data to improve business

Efficient data analysis enables companies to optimize everything in the value chain – from sales to order delivery, to optimal store hours.

Below tabular chart shows in what area various businesses use big data application to improve their business models.

Big data enables the organization to define key marketing strategies and is utilized in almost every sector of industries.

DomainApplications
Retail / ecommerce01. Market basket analysis
02. Campaign & customer loyalty mgmt program
03.Supply chain management & analytics
04. Behavior tracking
05. Market and consumer segmentation
06. Recommendation engines (to increase order size through complementary products)
07. Cross-channel analytics
08. Individual targeting with right offer at right time
Financial Services01. Real-time customer insights
02. Risk analysis and management
03. Fraud detection
04. Customer loyalty management
05. Credit risk modeling/analysis
06. Trade surveillance, detecting abnormal activities
IT Operations01. Log analysis for pattern identification/process analysis.
02. Massive storage and parallel processing
03. Data mashup to extract intelligence from data
Health & Life Sciences01. Health-insurance fraud detection
02. Campaign management
03. Brand & reputation management
04. Patient care and service quality management
05. Gene mapping and analytics
06. Drug discovery
Communication, Media & Technology01. Real-time calls analysis
02. Network performance management
03. Social graph analysis
04. Mobile user usage analysis
Governance01. Compliance and regulatory analysis
02. Threat detection, crime prediction
03. Smart cities and e-governance
04. Energy management