Prior to the configuration of Hive with MySQL metastore, let’s know some important things about Apache Hive and it’s metastore. Apache Hive Metastore is normally configured with Derby Database. But that setting is recommended just for the testing or ad-hoc development purpose. When hive is used in production, its metastore should be configured in databases like MySQL or Postgres.
Apache Hive is the client side library which provides SQL like interface to the end users for writing simplified MapReduce job to process structured data stored in HDFS.
The Hive Metastore is made up of two parts,
- Database which stores the metadata information of tables and columns with corresponding HDFS file mappings
- Metastore which is a service to access the actual data residing on HDFS using metadata in Database
Hive Metastore Configuration has 3 modes
- Instances of Hive Driver, Metastore interface and Database are in a single JVM
- Used for development and testing purpose as only one user can connect to derby database
- Instances of Hive Driver, Metastore interface are in single JVM while database is in a separate process/machine
- Configured with external database like MySQL on small scale production setup (with small clusters)
- Instances of Hive Driver, Metastore interface and database all are in a separate JVM and/or machine
- Metastore process communicates with MySQL and Hive client in order to get metadata information from database
- Configured with external database like MySQL on large scale production setup (with large clusters)
In this post we’ll learn step by step process of Hive metastore configuration as Remote Metastore with MySQL on CentOS as most of the time we work on relatively small clusters. And I have assumed that we already have working setup of fully distributed Hadoop 2.6.0 and Hive 1.2.1 with us. If you are setting up the environment from scratch, I recommend you to go through following posts first.
Step 1: Install and Configure MySQL on Master Node
Install MySQL on CentOS Master node using following command.
$ yum install mysql-server
After MySQL is installed start mysql database service using following command. Also execute
chkconfig command for
mysqld service to start the db service at the time of booting.
$ service mysqld start
$ chkconfig mysqld on
In order to access the database from remote machine make sure that firewall port filtering is disabled. Also download the mysql JDB Connector to access the MySQL database from Metastore Service. Place the connector jar to
$HIVE_HOME/lib directory to make it available in classpath.
$ cd /data
$ wget http://www.java2s.com/Code/JarDownload/mysql/mysql-connector-java-commercial-5.1.7-bin.jar.zip
$ unzip mysql-connector-java-commercial-5.1.7-bin.jar.zip
$ cp mysql-connector-java-commercial-5.1.7-bin.jar $HIVE_HOME/lib/
Step 2: Create Database and User with proper access rights
Your MySQL is now installed. Try following command to access the mysql prompt.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 811
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
Now execute below commands on mysql prompt to configure hive user and metastore schema in MySQL.
mysql> create database hive DEFAULT CHARACTER SET utf8;
mysql> grant all PRIVILEGES on *.* TO 'hive'@'localhost' IDENTIFIED BY 'hive_password' WITH GRANT OPTION;
In above command, in place of localhost you can use IP Address or hostname of the Metastore Server if your Metastore Server and MySQL are on different machines. Here we have kept both on the same machine.
Step 3: Configure Hive to work with MySQL
$HIVE_HOME/conf/hive-site.xml file and change the configurations as following.
<description>JDBC connection string used by Hive Metastore</description>
<description>JDBC Driver class</description>
<description>Metastore database user name</description>
<description>Metastore database password</description>
<description>Thrift server hostname and port</description>
In above configurations, you can use relevant IP Address or host name in place of localhost. Additionally you also have to add following line to
$HIVE_HOME/conf/hive-env.sh for informing hive to open thrift port(same as mentioned under
hive.metastore.uris property in
hive-site.xml file). This port is used by hive client to connect to Metastore through RPC.
Step 4: Start Hive Metastore Server
We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.
$ nohup bin/hive --service metastore &
You can confirm the running metastore process with jps command. If you find RunJar in the list that means the Metastore process is running.
That’s it you are done 🙂
You can also access the
HiveContext in Spark in with same Metastore. But for that you need to copy
$SPARK_HOME/conf/ directory on all Spark Nodes.
In case you find any problem in above steps inform me my commenting on this post. Thank you for you time…..!!!!!