BigQuery Reddit数据集:从Subreddits收集评论?

时间:2016-09-08 08:22:22

标签: google-bigquery reddit

一般来说,BigQuery和SQL都是新手!我在线发现了这个惊人的Reddit评论数据集(https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2015_05),并希望对评论做一些定性分析。

问题:如何缩小搜索范围,仅检索r / cancer subreddit和r / diabetes subreddits中的评论和时间戳?我应该使用的确切查询是什么?

我知道这可能很容易,但我花了4-5个小时才完成这个但仍然无法弄明白......

1 个答案:

答案 0 :(得分:2)

SELECT subreddit, COUNT(*) c
FROM [fh-bigquery:reddit_comments.2015_05] 
WHERE subreddit IN ('cancer', 'diabetes')
GROUP BY 1
LIMIT 1000

Query complete (1.6s elapsed, 595 MB processed)

Row subreddit   c    
1   diabetes    6508     
2   cancer      1923     

原始评论和时间戳:

SELECT subreddit, created_utc, body
FROM [fh-bigquery:reddit_comments.2015_05] 
WHERE subreddit IN ('cancer', 'diabetes')
LIMIT 10