Question

我已经安装了Titan和Faunus，每个似乎都正常工作（titan-0.4.4＆amp; faunus-0.4.4）

然而，在Titan中摄取了一个相当大的图并尝试通过Faunus导入它

FaunusFactory.open(    )

我遇到了问题。更准确地说，我似乎从FaunusFactory.open（）调用得到一个动态图，

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

然后，甚至问一个简单的

g.v(10)

我确实收到了这个错误：

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

我的属性文件直接从带有Titan-HBase输入的Faunus页面中取出，当然除了改变hadoop集群的URL：

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

任何人都可以提供帮助？

附录：

为了解决下面的评论，这里有一些背景：正如我所提到的，我在Titan中有一个图表，可以对其执行基本的gremlin查询。

但是，我确实需要运行一个gremlin全局查询，由于图形的大小，它需要Faunus及其底层MR功能。因此需要导入它。我得到的错误并没有让我觉得它表明图形本身存在一些不一致。

Answer 1

我不确定你有＆＃34; flow＆＃34; Faunus的权利。如果您的最终结果是对图表进行全局查询，那么请考虑以下方法：

将图表拉到sequence file
通过序列文件发出全局查询

更具体地说，创建hbase-seq.properties：

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

在Faunus，请复制：

g = FaunusFactory.open('hbase-seq.properties')
g._()

这将从hbase读取图形并将其写入HDFS中的序列文件。接下来，使用以下内容创建：seq-noop.properties：

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

以上配置将从上一步读取您的序列文件，而无需重新编写图形（这是NoOpOutputFormat的用途）。现在在Faunus做：

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

这将执行学位分配，将结果写入HDFS进行分析＆＃39;目录。显然你可以在这里做任何你想要的Faunus风格的Gremlin - 我只想提供一个例子。我认为这是一个非常标准的流程＆＃34;或者从图形分析角度使用Faunus的模式。

将泰坦图形摄入Faunus时出现问题

1 个答案: