如何快速启动apache钻取并对csv文件执行查询?

时间:2015-08-23 08:48:03

标签: apache-drill

我已经设法在ubuntu 14.04无头虚拟机上安装apache drill。

我已经放置了一个我想要执行查询的csv文件。

我阅读了这些教程,但是当我想快速启动时,对我来说没有任何意义。

Got it to install. Now what?

请告知。

1 个答案:

答案 0 :(得分:3)

如果您的CSV没有标题行,请按以下方式查询文件:

select * from dfs.`/Users/khahn/drill/apache-drill-1.1.0/csv_no_header.csv`;
+------------------------+
|        columns         |
+------------------------+
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
| ["hello","1","2","3"]  |
+------------------------+
7 rows selected (1.427 seconds)

如果您的csv确实有标题行,则需要将skipFirstLine属性添加到存储插件(本例中为dfs)定义:

"csv": {
  "type": "text",
  "extensions": [
    "csv"
  ],
  "skipFirstLine": true,
  "delimiter": ","
},

Apache Drill docs中描述了通过REST更新存储插件。

带标题行的CSV:

name, num1, num2,num3
hello,1,2,3
hello,1,2,3
hello,1,2,3
hello,1,2,3
hello,1,2,3
hello,1,2,3
hello,1,2,3

选择all的查询与没有标题的CSV查询相同。输出也是一样的。

要查询单个列use the COLUMNS[n] syntax

可能必须更改其他存储插件配置,具体取决于您的CSV文件内容。请参阅Configuring Drill to Read Text Files

相关问题