4 Steps to Configure Hive with MySQL Metastore on CentOS

Prior to the configuration of Hive with MySQL metastore, let’s know some important things about Apache Hive and it’s metastore. Apache Hive Metastore is normally configured with Derby Database. But that setting is recommended just for the testing or ad-hoc development purpose. When hive is used in production, its metastore should be configured in databases like MySQL or Postgres.

Apache Hive is the client side library which provides SQL like interface to the end users for writing simplified MapReduce job to process structured data stored in HDFS.

The Hive Metastore is made up of two parts,

  • Database which stores the metadata information of tables and columns with corresponding HDFS file mappings
  • Metastore which is a service to access the actual data residing on HDFS using metadata in Database

Hive Metastore Configuration has 3 modes

Embedded Metastore
  • Instances of Hive Driver, Metastore interface and Database are in a single JVM
  • Used for development and testing purpose as only one user can connect to derby database
Local Metastore
  • Instances of Hive Driver, Metastore interface are in single JVM while database is in a separate process/machine
  • Configured with external database like MySQL on small scale production setup (with small clusters)
Remote Metastore
  • Instances of Hive Driver, Metastore interface and database all are in a separate JVM and/or machine
  • Metastore process communicates with MySQL and Hive client in order to get metadata information from database
  • Configured with external database like MySQL on large scale production setup (with large clusters)
Hive Metastore Installation Modes
Hive Metastore Installation Modes

In this post we’ll learn step by step process of Hive metastore configuration as Remote Metastore with MySQL on CentOS as most of the time we work on relatively small clusters. And I have assumed that we already have working setup of fully distributed Hadoop 2.6.0 and Hive 1.2.1 with us. If you are setting up the environment from scratch, I recommend you to go through following posts first.

Step 1: Install and Configure MySQL on Master Node

Install MySQL on CentOS Master node using following command.

$ yum install mysql-server

After MySQL is installed start mysql database service using following command. Also execute chkconfig command for mysqld service to start the db service at the time of booting.

$ service mysqld start
$ chkconfig mysqld on

In order to access the database from remote machine make sure that firewall port filtering is disabled. Also download the mysql JDB Connector to access the MySQL database from Metastore Service. Place the connector jar to $HIVE_HOME/lib directory to make it available in classpath.

$ cd /data
$ wget http://www.java2s.com/Code/JarDownload/mysql/mysql-connector-java-commercial-5.1.7-bin.jar.zip
$ unzip mysql-connector-java-commercial-5.1.7-bin.jar.zip
$ cp mysql-connector-java-commercial-5.1.7-bin.jar  $HIVE_HOME/lib/

Step 2: Create Database and User with proper access rights

Your MySQL is now installed. Try following command to access the mysql prompt.

$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 811
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Now execute below commands on mysql prompt to configure hive user and metastore schema in MySQL.

mysql> create database hive DEFAULT CHARACTER SET utf8;
mysql> grant all PRIVILEGES on *.* TO 'hive'@'localhost' IDENTIFIED BY 'hive_password' WITH GRANT OPTION;

In above command, in place of localhost you can use IP Address or hostname of the Metastore Server if your Metastore Server and MySQL are on different machines. Here we have kept both on the same machine.
Step 3: Configure Hive to work with MySQL
Open $HIVE_HOME/conf/hive-site.xml file and change the configurations as following.

<?xml version="1.0"?>

<configuration>

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive</value>
    <description>JDBC connection string used by Hive Metastore</description>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>JDBC Driver class</description>
  </property>
  
  <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>Metastore database user name</description>
  </property>
  
  <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive_password</value>
      <description>Metastore database password</description>
  </property>
  
  <property>
      <name>hive.metastore.uris</name>
      <value>thrift://localhost:9084</value>
      <description>Thrift server hostname and port</description>
  </property>

</configuration>

In above configurations, you can use relevant IP Address or host name in place of localhost. Additionally you also have to add following line to $HIVE_HOME/conf/hive-env.sh for informing hive to open thrift port(same as mentioned under hive.metastore.uris property in hive-site.xml file). This port is used by hive client to connect to Metastore through RPC.

export METASTORE_PORT=9084

Step 4: Start Hive Metastore Server

We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.

$ nohup bin/hive --service metastore &

You can confirm the running metastore process with jps command. If you find RunJar in the list that means the Metastore process is running.

$  $JAVA_HOME/bin/jps
7669 RunJar
14592 Master
14735 Worker
28824 NameNode
29240 ResourceManager
29095 SecondaryNameNode
28916 DataNode
29334 NodeManager
24571 Jps

That’s it you are done :)

You can also access the HiveContext in Spark in with same Metastore. But for that you need to copy $HIVE_HOME/conf/hive-site.xml to $SPARK_HOME/conf/ directory on all Spark Nodes.

In case you find any problem in above steps inform me my commenting on this post. Thank you for you time…..!!!!!

References:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>