4 Steps to Configure Hive with MySQL Metastore on CentOS

Prior to the configuration of Hive with MySQL metastore, let’s know some important things about Apache Hive and it’s metastore. Apache Hive Metastore is normally configured with Derby Database. But that setting is recommended just for the testing or ad-hoc development purpose. When hive is used in production, its metastore should be configured in databases like MySQL or Postgres.

Apache Hive is the client side library which provides SQL like interface to the end users for writing simplified MapReduce job to process structured data stored in HDFS.

The Hive Metastore is made up of two parts,

  • Database which stores the metadata information of tables and columns with corresponding HDFS file mappings
  • Metastore which is a service to access the actual data residing on HDFS using metadata in Database

Hive Metastore Configuration has 3 modes

Embedded Metastore

  • Instances of Hive Driver, Metastore interface and Database are in a single JVM
  • Used for development and testing purpose as only one user can connect to derby database

Local Metastore

  • Instances of Hive Driver, Metastore interface are in single JVM while database is in a separate process/machine
  • Configured with external database like MySQL on small scale production setup (with small clusters)

Remote Metastore

  • Instances of Hive Driver, Metastore interface and database all are in a separate JVM and/or machine
  • Metastore process communicates with MySQL and Hive client in order to get metadata information from database
  • Configured with external database like MySQL on large scale production setup (with large clusters)
Hive Metastore Installation Modes
Hive Metastore Installation Modes

In this post we’ll learn step by step process of Hive metastore configuration as Remote Metastore with MySQL on CentOS as most of the time we work on relatively small clusters. And I have assumed that we already have working setup of fully distributed Hadoop 2.6.0 and Hive 1.2.1 with us. If you are setting up the environment from scratch, I recommend you to go through following posts first.

Step 1: Install and Configure MySQL on Master Node

Install MySQL on CentOS Master node using following command.

After MySQL is installed start mysql database service using following command. Also execute chkconfig command for mysqld service to start the db service at the time of booting.

In order to access the database from remote machine make sure that firewall port filtering is disabled. Also download the mysql JDB Connector to access the MySQL database from Metastore Service. Place the connector jar to $HIVE_HOME/lib directory to make it available in classpath.

Step 2: Create Database and User with proper access rights

Your MySQL is now installed. Try following command to access the mysql prompt.

Now execute below commands on mysql prompt to configure hive user and metastore schema in MySQL.

In above command, in place of localhost you can use IP Address or hostname of the Metastore Server if your Metastore Server and MySQL are on different machines. Here we have kept both on the same machine.

Step 3: Configure Hive to work with MySQL

Open $HIVE_HOME/conf/hive-site.xml file and change the configurations as following.

 

In above configurations, you can use relevant IP Address or host name in place of localhost. Additionally you also have to add following line to $HIVE_HOME/conf/hive-env.sh for informing hive to open thrift port(same as mentioned under hive.metastore.uris property in hive-site.xml file). This port is used by hive client to connect to Metastore through RPC.

Step 4: Start Hive Metastore Server

We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.

You can confirm the running metastore process with jps command. If you find RunJar in the list that means the Metastore process is running.

That’s it you are done 🙂

You can also access the HiveContext in Spark in with same Metastore. But for that you need to copy $HIVE_HOME/conf/hive-site.xml to $SPARK_HOME/conf/ directory on all Spark Nodes.

In case you find any problem in above steps inform me my commenting on this post. Thank you for you time…..!!!!!

References:

Leave a Reply

Your email address will not be published. Required fields are marked *