我正在使用主题标签从推特上抓取数据。我下面的代码运行良好。但是,我想获得 10 000 条推文并将它们保存在同一个 JSON 文件夹中(或者将它们保存在单独的文件夹中,然后合并为一个)。当我运行代码并打印数据框的长度时,它只打印了 100 条推文。
import json
credentials = {}
credentials['CONSUMER_KEY'] = ''
credentials['CONSUMER_SECRET'] = ''
credentials['ACCESS_TOKEN'] = ''
credentials['ACCESS_SECRET'] = ''
# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:
json.dump(credentials, file)
# Import the Twython class
from twython import Twython
import json
# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
creds = json.load(file)
# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
data = python_tweets.search(q='#python', result_type='mixed', count=10000)
with open('tweets_python.json', 'w') as fh:
json.dump(data, fh)
data1 = pd.DataFrame(data['statuses'])
print("\nSample size:")
print(len(data1))
OUTPUT:
Sample size:
100
我看到了一些可以使用 max_id 的答案。我曾尝试编写代码,但这是错误的。
max_iters = 50
max_id = ""
for call in range(0,max_iters):
data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
File "<ipython-input-69-1063cf5889dc>", line 4
data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
^
SyntaxError: invalid syntax
你能告诉我如何将 10 000 条推文保存到一个 JSON 文件中吗?
答案 0 :(得分:0)
根据他们的文档 here,您可以使用生成器并获得尽可能多的结果。
results = python_tweets.cursor(twitter.search, q='python', result_type='mixed')
with open('tweets_python.json', 'w') as fh:
for result in results:
json.dump(result, fh)
另外,如果你想做 max_id 方法,参数应该如下传递
python_tweets.search(q='#python', result_type='mixed', count=10000, max_id=max_id)