Hive Query – HiveQL, Hive Select, Join, Order, Sort, Indexes

Hive Query is similar to SQL. Hive queries are written in HiveQL and can be used to query the data without knowledge of Java or MapReduce.

Hive Query – HiveQL

SELECT … FROM Clause

SELECT … FROM clause is almost same as SQL. Please see few examples below.

Hive SELECT all query

From above Hive query, it will select X records from aliens. Array elements in a record will be within brackets

Hive SELECT query with array indexing

From the above Hive query, the first element of the array is selected.

Hive SELECT query with STRUCT column.

From above Hive query output will show the struct column in JSON format and the first element of the array is selected.

Hive Manipulating Column values

We can manipulate column values using function calls and arithmetic expressions.

Hive Column Alias and Nested Queries

Select query with column alias

Hive Column aliases for nested queries

Hive Selecting data from partitions

Hive scans only partitions relevant to the query, thus improving performance.

Assuming partitioned on cities and there are 4 partitions with equal volume of data, query will partition only 1/4th of the data

Hive JOIN Statements

Hive supports ANSI joins

Hive query also supports Inner joins, left outer joins, Right outer, Full outer, left semi-joins

Hive Order By 

  • Order By clause similar to SQL
  • Performs ordering using single reducer
  • Will take a long time for large datasets

Hive Sort By

  • SORT BY Clause
  • Data is ordered in each reducer – each reducer’s output is sorted
  • If multiple reducers – final results may not be as expected

Hive views

Hive views are similar to SQL views which reduce the Hive query complexity by encapsulating the complex queries from end users.

A user can drop the view in the same way as the table.

Hive Indexes

Hive can build indexes on columns to speed up querying operations. The index data is stored in another table.

This is a very useful concept and a good alternative to partitioning when the logical partitions are small and numerous.

Let us consider the below table:

If we need to create an index on country partition only, then below is the syntax:

The AS clause specifies the index handler, a java inbuilt class that implements indexing.

Hive BITMAP Indexes

Showing an Index

Dropping an Index

Team RCV Academy

About Team RCV Academy

RCV Academy Team is a group of professionals working in various industries and contributing to tutorials on the website and other channels.

View all posts by Team RCV Academy