Hive Metastore Configuration After Fresh Installation

For the beginners playing around in Hive, a stoppage arises with the proper configuration. After placing Hive libraries in designated folders and updating necessary environment variables, many times the first eager execution of hive fails with the exception “HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient”. That’s when Hive Metastore needs to be configured, which is pretty simple and straightforward.

There are two ways to configure Hive Metastore. We can use ‘schematool’ or directly source the hive-schema-3.1.0.mysql.sql script provided by Hive into the Metastore database.

Original Link

Create Your Own Metastore Event Listeners in Hive With Scala

Hive metastore event listeners are used to detect every single event that takes place whenever an event is executed in Hive. If you want a certain action to take place for an event, you can override MetaStorePreEventListener and provide your own implementation.

In this article, we will learn how to create our own metastore event listeners in Hive using Scala and SBT.

So let’s get started!

First, add the following dependencies in your build.sbt file:

libraryDependencies += "org.apache.hive" % "hive-exec" % "1.2.1" excludeAll ExclusionRule(organization = "org.pentaho") libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.3" libraryDependencies += "org.apache.httpcomponents" % "httpclient" % "4.3.4" libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0" libraryDependencies += "org.apache.hive" % "hive-service" % "1.2.1" unmanagedJars in Compile += file("/usr/lib/hive/lib/hive-exec-1.2.1.jar") assemblyMergeStrategy in assembly := { case PathList("META-INF", xs @ _*) => MergeStrategy.discard case x => MergeStrategy.first

Now, create your first class. You can name it anything, but I named it OrcMetastoreListener. This class must extend the MetaStorePreEventListener class of Hive and take the Hadoop conf as the constructor argument:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hive.metastore.MetaStorePreEventListener
import class OrcMetastoreListener(conf: Configuration) extends MetaStorePreEventListener(conf) { override def onEvent(preEventContext: PreEventContext): Unit = { preEventContext.getEventType match { case CREATE_TABLE => val tableName = preEventContext.asInstanceOf[PreCreateTableEvent].getTable tableName.getSd.setInputFormat("") tableName.getSd.setOutputFormat("") case ALTER_TABLE => val newTableName = preEventContext.asInstanceOf[PreAlterTableEvent].getNewTable newTableName.getSd.setInputFormat("") newTableName.getSd.setOutputFormat("") case _ => //do nothing } }

The pre-event context contains all the Hive metastore events. In my case, I want all tables generated in Hive to use the Hive input format and output format — and the same thing for the altering command.

The best use case for this listener is when somebody wants to query a data source such as spark or any other data source using its own custom input format and even don’t want to alter the schema of hive table to use his custom input format

Now, let’s build a JAR from the core and use it in Hive.

First, add an SBT assembly plugin in your plugins.sbt file:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

Now, go to your root project and build the JAR with command SBT assembly. It will build your JAR, collect your JAR, and put it in your $HIVE_HOME/lib path. Inside the $HIVE_HOME/conf  folder, add the following contents in hive-site.xml:

<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>hive.metastore.schema.verification</name> <value>false></value> </property> <property> <name>hive.metastore.pre.event.listeners</name> <value>metastorelisteners.OrcMetastoreListener</value> </property>

Now, create a table in Hive and describe it:

Time taken: 2.742 seconds
id int Detailed Table Information Table(tableName:hivetable, dbName:default, owner:hduser,e,,, compressed:false, Time taken: 0.611 seconds, Fetched: 3 row(s)

And that’s it!

Original Link