7 Steps to Install Apache Hive with Hadoop on CentOS

Before we learn to install Apache Hive on CentOS let me give you the introduction of it. Hive is basically a data warehouse tool to store and process the structured data residing on HDFS. Hive was developed by Facebook and than after it is shifted to Apache Software Foundation and became an open source Apache Hive.

Apache Hive
Apache Hive

What Apache Hive is

  • Tool used for data warehouse infrastructure
  • This tool is designed for structured data only
  • It stores and processes structured data residing in HDFS
  • Internally uses Hadoop MapReduce for Data Processing

What Apache Hive is not

  • It is not a Relational DB like MySQL, Oracle, Postgres etc..
  • It is not designed for real-time query processing
  • It doesn’t support transactions, updates or delete at row level

Enough of the concepts, now let’s know the installation part. Here installation steps are using hive version 1.2.1

Step 1: Complete the installation of Java and Hadoop on CentOS

Before we install Hive we need to make sure that Java and Hadoop are already installed on our master node.

Step 2: Download and Extract Apache Hive and Derby

Execute following commands to download Hive and Derby from Apache Mirrors

Setup Derby Enviroment Variables

Setup Hive Enviroment Variables

Load environment variables

Step 4: Hive Configurations in hive-site.xml

Go to $HIVE_HOME/conf directory and create hive-site.xml with following content.

hive-site.xml

Step 5: Create hive directories on HDFS

Create hive warehouse directories on HDFS and give proper access rights to them using below commands

Step 6: Start/Stop Derby Server

Start Derby Server using following command.

You can stop this process by killing the process

Step 7: Open hive shell

Open hive shell using following command and get ready for executing your hive commands.

You are done…. 🙂 but that is not it. We’ll go one extra step 🙂

What if you are getting following exception?

[ERROR] Terminal initialization failed; falling back to unsupported

No worries, here is the solution. You just have to remove jline-0.9.94.jar file from $HADOOP_HOME/share/hadoop/yarn/lib/ directory. We’ll just rename that file with following command.

Now try Step 7 again….. 🙂

Write your valuable comments below and Stay tuned for more learning…..!!!!!

References:

Leave a Reply

avatar
  Subscribe  
Notify of