Question

我正在使用主题标签从推特上抓取数据。我下面的代码运行良好。但是，我想获得 10 000 条推文并将它们保存在同一个 JSON 文件夹中（或者将它们保存在单独的文件夹中，然后合并为一个）。当我运行代码并打印数据框的长度时，它只打印了 100 条推文。

import json
credentials = {}
credentials['CONSUMER_KEY'] = ''
credentials['CONSUMER_SECRET'] = ''
credentials['ACCESS_TOKEN'] = ''
credentials['ACCESS_SECRET'] = ''

# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:
    json.dump(credentials, file)

# Import the Twython class
from twython import Twython
import json

# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
    creds = json.load(file)

# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])

data = python_tweets.search(q='#python', result_type='mixed', count=10000)

with open('tweets_python.json', 'w') as fh:
    json.dump(data, fh)

data1 = pd.DataFrame(data['statuses'])

print("\nSample size:")
print(len(data1))

OUTPUT:
Sample size:
100

我看到了一些可以使用 max_id 的答案。我曾尝试编写代码，但这是错误的。

max_iters = 50
max_id = ""
for call in range(0,max_iters):
       data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)

 File "<ipython-input-69-1063cf5889dc>", line 4
    data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
                                                                                       ^
SyntaxError: invalid syntax

你能告诉我如何将 10 000 条推文保存到一个 JSON 文件中吗？

Answer 1

根据他们的文档 here，您可以使用生成器并获得尽可能多的结果。

results = python_tweets.cursor(twitter.search, q='python', result_type='mixed')
with open('tweets_python.json', 'w') as fh:
    for result in results:
        json.dump(result, fh)

另外，如果你想做 max_id 方法，参数应该如下传递

python_tweets.search(q='#python', result_type='mixed', count=10000, max_id=max_id)

如何使用 twython python 获取 100 多条推文

1 个答案: