Hive Server 2 调研，安装和部署_hive jdbc statment.cancle_lalaguozhe的博客

我们使用Hive Server 1已经很长时间了，用户ad-hoc query，hive-web, wormhole，运营工具等都是通过hive server来提交语句。但是hive server极其不稳定，经常会莫名奇妙假死，导致client端所有的connection都被block住了。对此我们不得不配置一个crontab检查脚本，会不断执行"show tables"语句来检测server是否假死，如果假死，只能杀死daemon进程重启。另外Hive Server 1的concurrency支持不好，如果一个用户在连接中设置了一些环境变量，绑定到一个thrift worker thread, 用户断开连接，另一个用户也创建了一个连接，他有可能也被分配到之前的worker thread，会复用之前的配置。这是因为thrift不支持检测client是否断开链接，它也就无法清除session状态信息。同时session绑定到worker thread的方式很难做HA。Hive Server 2中已经完美支持了session, client端每次RPC call的时候会带上一个SessionID, Server端会mapping到保存状态信息的Session State，使得任何一个worker thread都可以执行同一个Session的不同语句，而不会绑死在同一个上。

Hive 0.11 包含了Hive Server 1 和 Hive Server 2，还包含1的原因是为了做到向下兼容性。从长远来看都会以Hive Server 2作为首选。

1. 配置hive server监听端口和Host

<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
</property>
<property>
  <name>hive.server2.thrift.bind.host</name>
  <value>test84.hadoop</value>
</property>

2. 配置kerberos认证，这样thrift client与hive server 2, hive server 2与hdfs交互都由kerberos作认证

<property>
  <name>hive.server2.authentication</name>
  <value>KERBEROS</value>
  <description>
    Client authentication types.
       NONE: no authentication check
       LDAP: LDAP/AD based authentication
       KERBEROS: Kerberos/GSSAPI authentication
       CUSTOM: Custom authentication provider
               (Use with property hive.server2.custom.authentication.class)
  </description>
</property>
<property>
  <name>hive.server2.authentication.kerberos.principal</name>
  <value>hadoop/_HOST@DIANPING.COM</value>
</property>
<property>
  <name>hive.server2.authentication.kerberos.keytab</name>
  <value>/etc/hadoop.keytab</value>
</property>

3. 设置impersonation，这样hive server会以提交用户的身份去执行语句，如果设置为false，则会以起hive server daemon的admin user来执行语句

<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>

执行命令$HIVE_HOME/bin/hive --service hiveserver2或者 $HIVE_HOME/bin/hiveserver2 会调用org.apache.hive.service.server.HiveServer2的main方法来启动 hive log中输出日志信息如下：

2013-09-17 14:59:21,081 INFO  server.HiveServer2 (HiveStringUtils.java:startupShutdownMessage(604)) - STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting HiveServer2
STARTUP_MSG:   host = test84.hadoop/10.1.77.84
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.11.0
STARTUP_MSG:   classpath = 略.................
2013-09-17 14:59:21,957 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab
2013-09-17 14:59:21,958 INFO  service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.
2013-09-17 14:59:21,958 INFO  service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.
2013-09-17 14:59:21,958 INFO  service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.
2013-09-17 14:59:21,959 INFO  service.AbstractService (AbstractService.java:init(89)) - Service:ThriftCLIService is inited.
2013-09-17 14:59:21,959 INFO  service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.
2013-09-17 14:59:21,959 INFO  service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.
2013-09-17 14:59:21,960 INFO  service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.
2013-09-17 14:59:21,960 INFO  service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.
2013-09-17 14:59:22,007 INFO  metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(409)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2013-09-17 14:59:22,032 INFO  metastore.ObjectStore (ObjectStore.java:initialize(222)) - ObjectStore, initialize called
2013-09-17 14:59:22,955 INFO  metastore.ObjectStore (ObjectStore.java:getPMF(267)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2013-09-17 14:59:23,000 INFO  metastore.ObjectStore (ObjectStore.java:setConf(205)) - Initialized ObjectStore
2013-09-17 14:59:23,909 INFO  metastore.HiveMetaStore (HiveMetaStore.java:logInfo(452)) - 0: get_databases: default
2013-09-17 14:59:23,912 INFO  HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(238)) - ugi=hadoop/test84.hadoop@DIANPING.COM ip=unknown-ip-addr cmd=get_databases: default 
2013-09-17 14:59:23,933 INFO  service.AbstractService (AbstractService.java:start(104)) - Service:ThriftCLIService is started.
2013-09-17 14:59:23,948 INFO  service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
2013-09-17 14:59:24,025 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab
2013-09-17 14:59:24,047 INFO  thrift.ThriftCLIService (ThriftCLIService.java:run(435)) - ThriftCLIService listening on test84.hadoop/10.1.77.84:10000

可以看到在HiveServer2已经变成一个Compisite Service了，它包含了一组service，包括OperationManager，SessionManager，CLIService，ThriftCLIService。并且在初始化的时候会建立HiveMetaStore连接，并调用get_databases命令来测试。最后启动thrift server(实际上是一个TThreadPool)，监听在test84.hadoop/10.1.77.84:10000端口上 Beeline是hive 0.11引入的新的交互式CLI，它基于SQLLine，可以作为Hive JDBC Client端访问Hive Server 2，启动一个beeline就是维护了一个session。由于采用了kerberos认证方式，所以需要在本地有kerberos ticket，并且在connection url中指定hive server 2的service principal，此处为principal=hadoop/test84.hadoop@DIANPING.COM，另外用户名和密码可以不用填写，之后的语句会以当前ticket cache中principal的用户身份来执行。

-dpsh-3.2$ bin/beeline 
Beeline version 0.11.0 by Apache Hive
beeline> !connect jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM
scan complete in 2ms
Connecting to jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM
Enter username for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM: 
Enter password for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM: 
Connected to: Hive (version 0.11.0)
Driver: Hive (version 0.11.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://test84.hadoop:10000/default> select count(1) from abc;
+------+
| _c0  |
+------+
| 0    |
+------+
1 row selected (29.277 seconds)
0: jdbc:hive2://test84.hadoop:10000/default> !q
Closing: org.apache.hive.jdbc.HiveConnection

thrift client和server会建立一个session handler，有唯一的HandleIdentifier(SessionID)，由CLIService中的SessionManager统一管理(维护了SessionHandle对HiveSession的mapping关系)，HiveSession维护了SessionConf和HiveConf信息，用户的每次执行语句会新建一个driver，将hiveconf传进去后再执行语句，这也就是Hive server 2支持concurrency的方式。每次操作(会有不同的opType,比如EXECUTE_STATEMEN)会生成独立的OperationHandle，也有各自的HandleIdentifier。用户在beeline中输入"!q"会销毁该session，并且销毁相应的资源。 hive server 1的driver classname是 org.apache.hadoop.hive.jdbc.HiveDriver ，Hive Server 2的是 org.apache.hive.jdbc.HiveDriver ，这两个容易混淆。另外可以在connectionUrl中指定HiveConf param和变量，params之间用';'分割，params和variables用'#'来隔开。这些都是session级别的，hive在建立完session后，会首先执行set hiveconf key value语句。 1. 带hiveconf和variables: jdbc:hive2://test84.hadoop:10000/default?hive.cli.conf.printheader=true#stab=salesTable;icol=customerID import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Statement; public class HiveTest { public static void main(String[] args) throws SQLException { try { Class.forName("org.apache.hive.jdbc.HiveDriver"); } catch (ClassNotFoundException e) { e.printStackTrace(); Connection conn = DriverManager .getConnection( "jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM", "", ""); Statement stmt = conn.createStatement(); String sql = "select * from abc"; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); ResultSetMetaData rsmd = res.getMetaData(); int columnCount = rsmd.getColumnCount(); for (int i = 1; i <= columnCount; i++) { System.out.println(rsmd.getColumnTypeName(i) + ":" + rsmd.getColumnName(i)); while (res.next()) { System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2)); HiveStatement现在支持取消语句，调用Statement.cancel()会终止并销毁正在执行中的driver

注：如果kerberos认证有问题的话，可以在起client jvm时候增加JVM option " -Dsun.security.krb5.debug=true "来查看详细信息 ① Hive 直连MySQL获取元数据启动方式：只需直接启动 Hive 客户端，即可连接 ② Hive 先连接Metastore服务，再通过Metastore服务连接MySQL获取元数据启动方式： ①先启动Metastore服务 ②在启动 Hive 客户端 hive server 2提供JDBC/ODBC接口，使得用户可以远程访问 Hive 数据，即作为客户端的代理与Hadoop集群进行交互。 hive server 2 部署时需要部署到一个能访问集群的节点上，保证能够直接往Hadoop上提交数据。用户在客户端提交SQL语句时，由 hive server 请求HDFS或者提交计算任务到Yarn上，再由 hive server 2将结果返回给客户端。用户即由 hive server 2代理进行远程访问Hadoop集群的用户。基本概念介绍 1、 Hive Server 2基本介绍 Hive Server 2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The... [root@master hive ]# bin/ hive --service hive server 2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/op. Hive 在生产上是不需要部署集群的，操作 Hive 只需要通过它提供的客户端即可， Hive 提供了大致三类客户端： hive shell：通过 hive shell来操作 hive ，但是至多只能存在一个 hive shell，启动第二个会被阻塞，也就是说 hive shell不支持并发操作。 WebUI：通过HUE/Zeppelin来对 Hive 表进行操作。基于JDBC等协议：启动 hive server 2，通过jdbc协议可以访问 hive ， hive server 2支持高并发。简而言之，h 在 Hive Statement中有一个sessHandle： public class Hive Statement implements java.sql.Statement { private final TSessionHandle sessHandle; // 这个代表了 hive 的session，通过sessionId可以去 hive 服务器或者hadoop目录中获取hi... 1.Statement和PrepareStatement都是执行sql 的接口 2.PrepareStatement可以预防sql注入，实现原理是将SQL特殊的符号加反斜杠，使其转换为普通的字符串，而不是sql命令，就是我们常说的预编译过程，而Statement的没有这样的功能的 2.接口说明和使用实列 1.获取对象和执行更新操作sql * @author zeng public class Main { private static final Str <metastore作用> 客户端连接metastore服务，metastore再去连接MySQL数据库来存取元数据。有了metastore服务，就可以有多个客户端同时连接，而且这些客户端不需要知道M. lcc@lcc conf$ lcc@lcc conf$ hive --service hive server 2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/lcc/soft/ hive / hive /lib/log4j-slf...