7 Steps to Install Apache Hive with Hadoop on CentOS

Before we learn to install Apache Hive on CentOS let me give you the introduction of it. Hive is basically a data warehouse tool to store and process the structured data residing on HDFS. Hive was developed by Facebook and than after it is shifted to Apache Software Foundation and became an open source Apache Hive.

Apache Hive
Apache Hive

What Apache Hive is

  • Tool used for data warehouse infrastructure
  • This tool is designed for structured data only
  • It stores and processes structured data residing in HDFS
  • Internally uses Hadoop MapReduce for Data Processing

What Apache Hive is not

  • It is not a Relational DB like MySQL, Oracle, Postgres etc..
  • It is not designed for real-time query processing
  • It doesn’t support transactions, updates or delete at row level

Enough of the concepts, now let’s know the installation part. Here installation steps are using hive version 1.2.1

Step 1: Complete the installation of Java and Hadoop on CentOS

Before we install Hive we need to make sure that Java and Hadoop are already installed on our master node.

Step 2: Download and Extract Apache Hive and Derby

Execute following commands to download Hive and Derby from Apache Mirrors

Setup Derby Enviroment Variables
$ echo "" >> /etc/profile
$ echo "### Derby Variables ###" >> /etc/profile
$ echo "export DERBY_INSTALL=/opt/derby" >> /etc/profile
$ echo "export DERBY_HOME=/opt/derby" >> /etc/profile
Setup Hive Enviroment Variables
$ echo "" >> /etc/profile
$ echo "### Hive Variables ###" >> /etc/profile
$ echo "export HADOOP=/opt/hadoop/bin/hadoop" >> /etc/profile
$ echo "export HIVE_HOME=/opt/hive" >> /etc/profile
$ echo "export PATH=\$HIVE_HOME/bin:\$PATH" >> /etc/profile
Load environment variables
$ source /etc/profile

Step 4: Hive Configurations in hive-site.xml

Go to $HIVE_HOME/conf directory and create hive-site.xml with following content.

$ vi $HIVE_HOME/conf/hive-site.xml
hive-site.xml
<?xml version="1.0"?>
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby://master.backtobazics.com:1527/metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
 
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.ClientDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
</configuration>

Step 5: Create hive directories on HDFS

Create hive warehouse directories on HDFS and give proper access rights to them using below commands

$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir -p /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse

Step 6: Start/Stop Derby Server

Start Derby Server using following command.

$ nohup /opt/derby/bin/startNetworkServer -h master.backtobazics.com > /opt/derby/logs/server.log &

You can stop this process by killing the process

## Get process id
$ ps -ef | grep derby
root  7528  4111  5 10:58 pts/0    00:00:01 /usr/java/jdk1.7.0_25/bin/java -classpath /opt/derby/lib/derby.jar:/opt/derby/lib/derbynet.jar:/opt/derby/lib/derbytools.jar:/opt/derby/lib/derbyclient.jar org.apache.derby.drda.NetworkServerControl start -h master.backtobazics.com
root  7552  4111  0 10:58 pts/0    00:00:00 grep derby

## Kill the process
$ kill -9 7528

Step 7: Open hive shell

Open hive shell using following command and get ready for executing your hive commands.

$ $HIVE_HOME/bin/hive

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>

You are done…. :) but that is not it. We’ll go one extra step :)

What if you are getting following exception?

[ERROR] Terminal initialization failed; falling back to unsupported

No worries, here is the solution. You just have to remove jline-0.9.94.jar file from $HADOOP_HOME/share/hadoop/yarn/lib/ directory. We’ll just rename that file with following command.

$ mv $HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar $HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar~

Now try Step 7 again….. :)

Write your valuable comments below and Stay tuned for more learning…..!!!!!

References:

2 thoughts on “7 Steps to Install Apache Hive with Hadoop on CentOS”

  1. which: no hbase in (/opt/hive/bin:/opt/hive/bin:/opt/hive/bin:/opt/hive/bin:/usr/lib64/qt-3.3/bin:/home/hadoop/perl5/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/opt/hadoop/sbin:/opt/hadoop/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/usr/java/default/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hadoop/sbin:/opt/hadoop/bin:/opt/hadoop/sbin:/opt/hadoop/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

    Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
    Exception in thread “main” java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
    at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:366)
    at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
    at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
    at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
    … 9 more
    Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1627)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:80)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3317)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3356)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336)
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590)
    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
    … 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1625)
    … 23 more
    Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
    NestedThrowables:
    java.lang.reflect.InvocationTargetException
    at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:671)
    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:834)
    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:338)
    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:217)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
    at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
    at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
    at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
    at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:424)
    at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:453)
    at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:327)
    at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:294)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:581)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:546)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:612)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:398)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6396)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
    … 28 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
    at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330)
    at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203)
    at org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:162)
    at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:284)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
    at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
    at org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133)
    at org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:420)
    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:821)
    … 57 more
    Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the “BONECP” plugin to create a ConnectionPool gave an error : The specified datastore driver (“org.apache.derby.jdbc.ClientDriver”) was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:82)
    … 75 more
    Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver (“org.apache.derby.jdbc.ClientDriver”) was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
    at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
    at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
    … 77 more

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>