Apache spark Hive,带有maven shade的可执行JAR

时间:2016-05-27 13:01:33

标签: maven apache-spark datanucleus maven-shade-plugin spark-hive

我正在使用Apache Spark Hive构建apache-spark应用程序。到目前为止一切正常 - 我已经在Intellij IDEA中运行测试和整个应用程序,并且使用maven一起进行所有测试。

现在我想从bash运行整个应用程序,让它与本地单节点集群一起运行。我使用maven-shade-plugin构建单个可执行JAR。

当尝试使用SparkContext创建新的HiveContext时,应用程序崩溃。抛出的异常告诉我,hive不能创建Metastore,因为datanucleus及其插件系统存在一些问题。我尝试了几个问题,如何运行datanucleus插件系统与阴影,但运气不好。例如: Datanucleus, JDO and executable jar - how to do it?

使用hive组合应用程序的可执行JAR并从bash运行它的最佳方法是什么?也许是一些datanucleus及其插件系统的设置?

的pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>test</groupId>
    <artifactId>hive-test</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.7</version>
        </dependency>

        <!-- spark -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>
    </dependencies>

    <properties>
        <!-- To be specified in child pom:  <main.class></main.class> -->
        <final.jar.name>${project.artifactId}-${project.version}</final.jar.name>
        <main.class>com.test.HiveTest</main.class>
    </properties>

    <build>
        <plugins>
            <!-- the Maven compiler plugin will compile Java source files -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>

            <!-- the Maven Scala plugin will compile Scala source files -->
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-Xmax-classfile-name</arg>
                                <arg>110</arg>
                            </args>
                        </configuration>
                    </execution>
                    <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-Xmax-classfile-name</arg>
                                <arg>110</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>${main.class}</mainClass>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.XmlAppendingTransformer">
                                    <resource>plugin.xml</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            </transformers>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

SampleCode

object HiveTest {
  def main(args: Array[String]) {
    // Set up Spark
    val conf = new SparkConf(true)
      .setMaster("local")
      .setAppName("hive-test")

    println("Initializing spark context")
    val sc = new SparkContext(conf)

    println("Initializing hive context")
    val hc = new HiveContext(sc)
  }
}

抛出异常

java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
        at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
        at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
        at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
        at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
        at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
        at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:97)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
        at com.test.HiveTest$.main(HiveTest.scala:21)
        at com.test.HiveTest.main(HiveTest.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
        ... 18 more
Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught.
NestedThrowables:
java.lang.reflect.InvocationTargetException
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        ... 23 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
        ... 42 more
Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:283)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:416)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:301)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        ... 50 more

提前谢谢

2 个答案:

答案 0 :(得分:5)

我找到了解决问题的方法。在回答可执行jar中的datanucleus的原始问题时描述了解决方案:https://stackoverflow.com/a/27030103/6390361

  1. 修改 MANIFEST.MF 以假装它是datanucleus OSGi捆绑包。这可以通过从 datanucleus-core 清单添加 Bundle-SymbolicName Premain-Class 条目来完成。

  2. 在类路径(资源文件夹)中创建文件plugin.xml,并使用 datanucleus-core 项目中的root标记。

  3. 在插件标记的开头放置 datanucleus-core datanucleus-rdbms 中的所有扩展点标记。 RDBMS项目的所有扩展点都必须以 store.rdbms 作为前缀。这非常重要,因为datanucleus使用完全分类的ID,包括来自root插件标记的部分。

  4. 合并项目 datanucleus-core datanucleus-rdbms datanucleus-api-jdo <中的所有扩展程序标记< / strong>并将它们放在所有扩展点后面。请注意,在更多项目中存在一些扩展,因此您需要合并具有相同ID的扩展内容。

  5. 清单条目

    Bundle-SymbolicName: org.datanucleus;singleton:=true
    Premain-Class: org.datanucleus.enhancer.DataNucleusClassFileTransformer
    

    <强>的plugin.xml

    文件plugin.xml太大而无法粘贴在这里,但您应该能够手动合并它。以下代码包含具有固定ID的所有RDBMS扩展点。

    <?xml version="1.0" encoding="UTF-8"?>
    <?eclipse version="3.2"?>    
    <plugin id="org.datanucleus" name="DataNucleus Core" provider-name="DataNucleus">
    
        <!-- Extension points from datanucleus-core -->
        <extension-point id="api_adapter" name="Api Adapter" schema="schema/apiadapter.exsd"/>
        ...
    
        <!-- extension points from datanucleus-rdbms - fixed IDs -->
        <extension-point id="store.rdbms.connectionprovider" name="Connection Provider" schema="schema/connectionprovider.exsd"/>
        <extension-point id="store.rdbms.connectionpool" name="ConnectionPool" schema="schema/connectionpool.exsd"/>
        <extension-point id="store.rdbms.sql_expression" name="SQL Expressions" schema="schema/sql_expression.exsd"/>
        <extension-point id="store.rdbms.sql_method" name="SQL Methods" schema="schema/sql_method.exsd"/>
        <extension-point id="store.rdbms.sql_operation" name="SQL Expressions" schema="schema/sql_operation.exsd"/>
        <extension-point id="store.rdbms.sql_tablenamer" name="SQL Table Namer" schema="schema/sql_tablenamer.exsd"/>
        <extension-point id="store.rdbms.rdbms_mapping" name="RDBMS Mapping" schema="schema/rdbms_mapping.exsd"/>
    
        <!-- Merged extensions from datanucleus-core, datanucleus-rdbms and datanucleus-api-jdo -->
        <extension point="org.datanucleus.persistence_properties">...</extension>
        ...
    </plugin>
    

    <强>行家遮阳帘插件

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <manifestEntries>
                                <Main-Class>${main.class}</Main-Class>
                                <Premain-Class>org.datanucleus.enhancer.DataNucleusClassFileTransformer</Premain-Class>
                                <Bundle-SymbolicName>org.datanucleus;singleton:=true</Bundle-SymbolicName>
                            </manifestEntries>
                        </transformer>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                            <resource>reference.conf</resource>
                        </transformer>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    </transformers>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
            </execution>
        </executions>
    </plugin>
    

答案 1 :(得分:0)

以下是这些依赖项中较新的datanucleus-core-4.1.6https://pastebin.com/8b2deeCL构建的链接:

  • datanucleus-rdbms-4.1.7
  • datanucleus-api-jdo-4.2.1
  • {{1}}

根据@anebril的回答。