Question

我正在使用Tweepy来传输推文，并希望以CSV格式记录它们，以便我可以使用它们或稍后将它们加载到数据库中。请记住，我是一个菜鸟，但我确实知道有多种方法可以解决这个问题（建议非常受欢迎）。

长话短说，我需要将多个Python词典转换并附加到CSV文件中。我已经完成了我的研究（How do I write a Python dictionary to a csv file?）并尝试使用DictWriter和编写器方法。

但是，还有很多事情需要完成：

1）只将密钥写为标题一次。

2）当流式传输新推文时，需要附加值而不覆盖之前的行。

3）如果缺少值，则记录为NULL。

4）跳过/修复ascii编解码器错误。

以下是我希望最终得到的格式（每个值都在其单个单元格中）：

Header1_Key_1 Header2_Key_2 Header3_Key_3 ...

Row1_Value_1 Row1_Value_2 Row1_Value_3 ...

Row2_Value_1 Row2_Value_2 Row2_Value_3 ...

Row3_Value_1 Row3_Value_2 Row3_Value_3 ...

Row4_Value_1 Row4_Value_2 Row4_Value_3 ...

这是我的代码：

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):

    def on_data(self, data):
        json_data = json.loads(data)

        data_header = json_data.keys()
        data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'wb') as f:
                w = csv.DictWriter(f, data_header)
                w.writeheader(data_header)
                w.writerow(json_data)
        except BaseException, e:
            print 'Something is wrong', str(e)

        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

提前谢谢！

Answer 1

我使用facebook的图形API（facepy模块）做了类似的事情！

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):
    _headers = None
    def __init__(self,headers,*args,**keys):
        StreamListener.__init__(self,*args,**keys)
        self._headers = headers

    def on_data(self, data):
        json_data = json.loads(data)

        #data_header = json_data.keys()
        #data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'ab') as f: # a for append
                w = csv.writer(f)
                # write!
                w.writerow(self._valToStr(json_data[header])
                           if header in json_data else ''
                           for header in self._headers)
        except Exception, e:
            print 'Something is wrong', str(e)

        return True

    @static_method
    def _valToStr(o):
        # json returns a set number of datatypes - parse dependingly
        # https://docs.python.org/2/library/json.html#encoders-and-decoders
        if type(o)==unicode: return self._removeNonASCII(o)
        elif type(o)==bool: return str(o)
        elif type(o)==None: return ''
        elif ...
        ...

    def _removeNonASCII(s):
        return ''.join(i if ord(i)<128 else '' for i in s)

    def on_error(self, status):
        print status

if __name__ == '__main__':
    headers = ['look','at','twitter','api',
               'to','find','all','possible',
               'keys']

    # initialize csv file with header info
    with open('csv_tweet3.csv', 'wb') as f:
        w = csv.writer(headers)

    l = StdOutListener(headers)
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

它没有准备好复制和粘贴，但它应该能够完成它的清晰度。为了提高性能，您可能希望查看打开文件，写入多个记录，然后关闭文件。这样你就不会一直打开，初始化csv编写器，追加，然后关闭文件。我不熟悉tweepy API，所以我不确定这是如何工作的 - 但值得研究。

如果您遇到任何麻烦，我会很乐意提供帮助 - 享受！

在Python中将多个JSON写入CSV - 字典到CSV

1 个答案: