使用Apache Drill读取CSV文件时如何指定`delimiter?

时间:2015-12-17 15:45:53

标签: java maven csv apache-drill

我正在尝试使用Apache Drill 1.3.0 获取数字字段的汇总值,而不在文件系统上安装 HadoopApache Drill

现在问题是 - 我想在阅读CSV文件时指定delimiter,但我无法找到任何方法将delimiter设置为 ^(插入符号)使用java程序,因为我还没有安装Apache Drill

正如您在下面看到的,我的程序因错误而终止并抛出NumberFormatException,因为 ^(插入符号)是我的CSV中的分隔符。

的pom.xml

<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-jdbc</artifactId>
<version>1.3.0</version>
</dependency>

示例CSV文件:

“ffe92de2-e633-4741-BC4B-83e91dce665a.csv”

"100"^"100"
"100"^"100"

示例Java程序:

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

/*
* Created By: Ashish Pancholi
* On: 16-12-2015
* */

public class DrillExample {
    private static final String DRILL_JDBC_LOCAL_URI = "jdbc:drill:zk=local";
    private Connection con = null;

    public DrillExample() throws SQLException {
        con = new org.apache.drill.jdbc.Driver().connect(DRILL_JDBC_LOCAL_URI, getDefaultProperties());

    }

    public int getArchiveNumFieldSUM(String filepath) throws Exception {
        //query to be executed
        String query = "select sum(cast(columns[0] as int)) from dfs.`"+filepath+"`";
        try {
            // creating statement
            Statement stmt = con.createStatement();
            // executing query
            ResultSet rs = stmt.executeQuery(query);
            // get the number of rows from the result set
            rs.next();
            return rs.getInt(1);
        } catch (Exception ex) {
            throw ex;
        }
    }

    private Properties getDefaultProperties() {
        final Properties properties = new Properties();
        properties.setProperty(org.apache.drill.exec.ExecConstants.HTTP_ENABLE, "false");
        return properties;
    }

    public void shutdownApacheDrill(){
        if (con != null) {
            try {
                con.close();
            } catch (SQLException e) {
                //ignore
            }
        }
    }

    public static void main(String[] arg){
        try {
            int count = 0;
            DrillExample local = new DrillExample();
            String filePath = "C:/Workarea/Project/xyz/Structured Data/ffe92de2-e633-4741-bc4b-83e91dce665a.csv";
            try {
                count = local.getArchiveNumFieldSUM(filePath);
            } catch (Exception e) {
                e.printStackTrace();
            }
            System.out.println(count);
        }catch(Exception ex){
            ex.printStackTrace();
        }
    }

}

我收到以下错误:

    SYSTEM ERROR: NumberFormatException: 100"^"100

Fragment 0:0

[Error Id: 5c3ac2d7-fe66-4799-92cc-3efdc642c9dd on ashish:31010].
20:46:28.979 [USER-rpc-event-queue] INFO  o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#1] Query failed: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException: 100"^"100

Fragment 0:0

[Error Id: 5c3ac2d7-fe66-4799-92cc-3efdc642c9dd on ashish:31010]
    at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) [drill-common-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) [drill-common-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) [drill-java-exec-1.3.0.jar:1.3.0]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
    at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
java.sql.SQLException: SYSTEM ERROR: NumberFormatException: 100"^"100

Fragment 0:0

[Error Id: 5c3ac2d7-fe66-4799-92cc-3efdc642c9dd on ashish:31010]
    at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
    at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
    at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
    at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
    at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
    at net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
    at net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
    at org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:97)
    at org.openskye.core.structured.DrillExample.getArchiveNumFieldSUM(DrillExample.java:30)
    at org.openskye.core.structured.DrillExample.main(DrillExample.java:61)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:130)
Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException: 100"^"100

Fragment 0:0

[Error Id: 5c3ac2d7-fe66-4799-92cc-3efdc642c9dd on ashish:31010]
    at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
    at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112)
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
    at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
    at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
    at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
    at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
    at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
20:46:28.983 [USER-rpc-event-queue] DEBUG o.a.drill.exec.rpc.user.UserClient - Sending response with Sender 386235429
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:744)
0

0 个答案:

没有答案