HIVE Bucketing – Advantages of HIVE Bucketing | RCVAcademy

Unlike partitioning, HIVE bucketing is another way to decompose data into more manageable sets.

Consider our requirement is to create the partition based on date and then on user ID’s. In this case, there will be more numbers of small partition which will get created and it is difficult to manage.

Also, HIVE limits the maximum number of partitions that can be created.

Therefore, the below commands might fail to execute on hive shell:

Instead, if we bucket “uid” column and partition “dte” column then the value of “uid” column will be hashed depending upon the number of buckets user has defined.

HIVE Bucketing Advantages

HIVE Bucketing has several advantages. HIVE Bucketing improves the join performance if the bucket key and join keys are common.

HIVE Bucketing also provides efficient sampling in Bucketing table than the non-bucketed tables.

It also reduces the I/O scans during the join process if the process is happening on the same keys (columns).

To leverage the bucketing in the join operation we should SET hive.optimize.bucketmapjoin=true.

This setting hints to HIVE to do bucket level join during the map stage join.

It also reduces the scan cycles to find a particular key because bucketing ensures that the key is present in a certain bucket.

Team RCV Academy

About Team RCV Academy

RCV Academy Team is a group of professionals working in various industries and contributing to tutorials on the website and other channels.

View all posts by Team RCV Academy