HIVE Query Language (HQL) – HIVE Create Database, Create Table

HIVE Query Language (HQL) – HIVE Create Database, Create Table

In this section, we will discuss data definition language parts of HIVE Query Language(HQL), which are used for creating, altering and dropping databases, tables, views, functions, and indexes.

We will also look into SHOW and DESCRIBE commands for listing and describing databases and tables stored in HDFS file system.

Hive Database – HIVE Query

A database in Hive is just a namespace or catalog of tables. If we do not specify database, default database is used.

HIVE CREATE database syntax 

hive> CREATE DATABASE IF NOT EXISTS financial;

HIVE USE Database syntax

This command is used for using a particular database.

hive> use <database name>

hive> use financial;

HIVE SHOW all existing databases

hive> SHOW DATABASES;

default
financial

For each database, HIVE will create a directory and the tables say “EMP” in that database and say “financial” is stored in sub-directories.

The database directory is created under the directory specified in the parameter “hive.metastore.warehouse.dir”. 

Assuming the default value of the parameter is “/usr/hive/warehouse”, when we create the “financial” database, HIVE will create the sub-directory as “/usr/hive/warehouse/financial.db” where “.db” is the extension.

HIVE DESCRIBE database syntax

This will show the directory location of the “financial” database.

hive> DESCRIBE DATABASE financial;

financial
hdfs://usr/hive/warehouse/financial.db

HIVE ALTER Database Syntax

We can associate key-value pairs with a database in the DBPROPERTIES using ALTER DATABASE command.

hive> ALTER DATABSE financial SET DBPROPERTIES (‘edited by’ = ‘kmak’);

HIVE CREATE Table Syntax

CREATE table statement in Hive is similar to what we follow in SQL but hive provides lots of flexibilities in terms of where the data files for the table will be stored, the format used, delimiter used etc.

CREATE TABLE IF NOT EXISTS financial.EMP

(

Ename STRING,

EmpID INT,

Salary FLOAT,

Tax MAP<STRING, FLOAT>,

Subordinates ARRAY<STRING>,

Address STRUCT<city: STRING, state: STRING, zip: INT>

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

In the above example, the default location of the table will be “/user/hive/warehouse/financial.db/EMP”.

If we want to explicitly override the location then we need to use the “LOCATION” keyword in the above create table statement.

HIVE SHOW TABLES Syntax

This command lists the tables in the current working database.

hive> SHOW TABLES;

EMP

Table1

Table2

HIVE DESCRIBE Syntax

The DESCRIBE statement displays metadata about a table, such as the column names and their data types.

hive> DESCRIBE financial.emp.salary

salary    float

HIVE DROP Database Syntax

DROP Database statement deletes all the related tables and then delete the database.

hive>DROP DATABASE IF EXISTS financial;

hive> DROP DATABASE IF EXISTS financial CASCADE;

HIVE DROP Table Syntax

hive> DROP TABLE IF EXISTS emp;

HIVE ALTER Table Syntax

hive> ALTER TABLE emp RENAME TO employee;--Rename Table Name

hive> ALTER TABLE emp CHANGE name firstname String;--Rename Column Name

hive> ALTER TABLE emp CHANGE sal sal Double; -- Rename data types

hive> ALTER TABLE emp ADD COLUMNS (dno INT COMMENT 'Department number'); -- Add column name

hive> ALTER TABLE emp REPLACE COLUMNS (eid INT empid Int,  ename STRING name String);--Deletes all the columns from emp and replace it with two columns

HIVE DESCRIBE EXTENDED Syntax

This command is used for describing details about the table. If we replace EXTENDED with FORMATTED then it provides more verbose output.

The output lines in the description that start with LOCATION also shows the full URL path in the HDFS where hive will keep all the data related to a given table.

hive>DESCRIBE EXTENDED financial.EMP;
Ename STRING,
EmpID INT,
Salary FLOAT,
Tax MAP<STRING, FLOAT>,
Subordinates ARRAY<STRING>,
Address STRUCT<city: STRING, state: STRING, zip: INT>

Detailed Table Information&nbsp;&nbsp; Table( tableName: Emp, dbName:financial, owner:kmak…
Location:hdfs://user/hive/warehouse/financial.db/Emp,
Parameters:{creator=kmak, created at=’2016-02-02 11:01:43’,last_modified_user=kmak,last_modified_time=1443553211,…}

HIVE Managed and External Tables

HIVE Managed Tables

By default, HIVE tables are the managed tables. HIVE controls metadata and the lifecycle of the data.

HIVE stores all the data related to a given table in the subdirectory under the directory defined by the parameter “hive.metastore.warehouse.dir” which is “/user/hive/warehouse” by default.

Dropping a managed table deletes the data from the table by deleting the sub-directory that has got created for the respective table. “DESCRIBE EXTENDED” command output will tell whether a table is managed or extended.

The drawback of the managed table is less convenient to use with other tools. For

For Example: We have inserted some data into some table by Pig or some other tool. Now, we wanted to run some HIVE queries on the data inserted by Pig but do not want to give the ownership to HIVE. In such case, we create an external table which has access to the data but not the ownership.

HIVE External Tables

Let us suppose we want to analyze the marketing data getting ingested from different sources.

We ingest south and north zone data using many tools such as Pig, HIVE and so on. Let us assume data file for the SALES table resides in directory /data/marketing.

The following table declaration creates an external table that can read all the data files from /data/marketing distributed directory with data stored as comma-delimited data.

CREATE EXTERNAL TABLE IF NOT EXISTS SALES (
SaleId INT,
ProductId INT,
Quantity INT,
ProdName STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/marketing';

The keyword “EXTERNAL” tells HIVE that this table is external and the data is stored in the directory mentioned in “LOCATION” clause.

Since the table is external, HIVE does not assume it owns the data. Therefore, dropping table deletes only the metadata in HIVE Metastore and the actual data remains intact.

Therefore, if the data is shared between tools, then it is always advisable to create an external table to make ownership explicit.

Copy the schema (not the data) of an existing table

CREATE EXTERNAL TABLE IF NOT EXISTS financial.emp1
LIKE financial.emp
LOCATION ‘/data/marketing’;