Some of the key Hive components that we are going to learn in this post are UI, Driver, Compiler, Metastore, and Execution engine. Let us understand these Hive components one by one in detail below.
Apache Hive components
Hive User Interfaces (UI)
The user interface is for users to submit queries and other operations to the system. Hive includes mainly three ways to communicate to the Hive drivers.
- CLI (Command Line Interface)
This is the most common way of interacting with Hive where we use Linux terminal to issue queries directly to Hive drivers.
- HWI (Hive Web Interface)
It is an alternative to the CLI where we use the web browser to interact with Hive.
- JDBC/ODBC/Thrift Server
This allows the remote client to submit the request to HIVE and retrieve the result. HIVE_PORT environment variable needs to be specified with the available port number to let the server listen on.
It is important to note that CLI is a fat client which requires a local copy of all the HIVE components as well as the Hadoop client and configurations.
This component receives the queries from user interfaces (UI) and provides execute and fetch API’s modeled on JDBC/ODBC drivers.
This very component parses the query, does semantic analysis on different query blocks and finally generates the execution plan.
This is done with the help of tables and partitioned metadata that needed to be looked up into Metastore.
A Metastore is a component that stores the system catalog and metadata about tables, columns, partitions and so on.
For example – A create table definition statement is stored here. Metastore uses a relational database to store its metadata.
Apache Hive uses Derby database by default. However, this database has limitation such as multi-user access.
Any JDBC compliant database such as MySQL, Oracle can be used for Metastore. The key attributes that should be configured for Hive Metastore are given below:
Hive Execution Engine
This component is responsible for executing the execution plan created by the compiler.
The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. It processes the query and generates results same as MapReduce results. It basically uses the flavor of MapReduce.