freebase如何提取所有公司的详细信息?

时间:2015-11-10 14:54:13

标签: javascript freebase mql

我想从freebase中提取所有公司的详细信息。我试图使用mql查询。但它永远不会让我超过4100条记录。我也尝试过使用游标,但是使用游标也可以获得相同数量的记录。

我用谷歌搜索了一些人建议下载转储而不是提取信息。这是唯一的方法吗?如果是,那么如何从转储中获取以下信息。任何帮助都非常感谢。

[
  {
    "type": "/business/company",
    "name": null,
    "parent_company": [{}],
    "products": [].
    "industry": [].
    "founded": null,
    "net_income": [
      {
        "amount": null,
        "valid_date": null,
        "currency": null
      }
    ],
    "company_type": [],
    "headquarters": [{}],
    "number_of_employees": [{}],

    "/base/schemastaging/organization_extra/phone_number": [{}]
  }
]

1 个答案:

答案 0 :(得分:1)

首先,强制性警告。 Freebase已经被读了很多个月,很快就会被关闭。那里的数据陈旧。

我对该查询得到了4189的计数,所以听起来你很接近预期的结果。另一方面,Freebase中有超过40万家企业,所以也许您并不打算将查询限制为只有那些有净收入信息的企业。如果是这种情况,您可以通过将"optional": true添加到查询的该子句来修改查询。即

  "net_income": [{
    "amount": null,
    "valid_date": null,
    "currency": null,
    "optional": true
  }],

话虽如此,通过API查询400K是非常多的。要从Freebase数据转储中获取相同的信息,只需过滤您查询中包含的相同属性。

请注意,多年来,这种架构已经进行了一些重要的重构,因此查询中的某些内容不是当前首选的属性名称,而是较旧的别名。例如,/ business / company的当前名称是/ business / business_operation,/ business / company / established实际上只是/ organization / organization / date_founded的别名,所以你想要的是什么在转储中寻找。

在转储中,所有斜杠(/)都用点(。)替换,因此您可以使用这样的zgrep命令进行过滤:

$ zgrep "organization\.organization.\parent" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b0njl> <http://rdf.freebase.com/ns/organization.organization.parent>   <http://rdf.freebase.com/ns/m.010d_x4z> .
<http://rdf.freebase.com/ns/m.010qw9c3> <http://rdf.freebase.com/ns/organization.organization.parent>   <http://rdf.freebase.com/ns/m.0110pjfc> .

$ zgrep "business\.business_operation\.industry" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b2kgs> <http://rdf.freebase.com/ns/business.business_operation.industry>   <http://rdf.freebase.com/ns/m.0c5mq>    .
<http://rdf.freebase.com/ns/m.010h6tq9> <http://rdf.freebase.com/ns/business.business_operation.industry>   <http://rdf.freebase.com/ns/m.02y_9m3>  .

对于调解员或CVT,每个调解员都会有一条单独的线。因此,例如,名称更改可能如下所示:

<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.end_date>  "2004"^^<http://www.w3.org/2001/XMLSchema#gYear>    .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.company>   <http://rdf.freebase.com/ns/m.06_dbm>   .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.start_date>    "1974"^^<http://www.w3.org/2001/XMLSchema#gYear>    .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.new_name>  "Cinar"@en  .