将已抓取的数据附加到JSON文件

时间:2018-12-10 09:33:34

标签: python json selenium-webdriver web-scraping beautifulsoup

我正在尝试从报废的数据制作一个json文件。但是基于我的函数converToJson(),它会覆盖先前的条目,而不是追加。是因为我没有遍历它吗?例如:下面的Json文件每次都将用新数据覆盖第一个条目,而不是附加到其后。

[{“数量”:“数量:\ n6,061,086”,“价格”:“ $ 41.88”,“名称”:“ Suncor Energy Inc。”}]

def getStockDetails(url, browser):

        print(url)
        browser.get(url)

        quote_wrapper = browser.find_element_by_css_selector('div.quote-wrapper')
        quote_name = quote_wrapper.find_element_by_class_name(
            "quote-name").find_element_by_tag_name('h2').text
        quote_price = quote_wrapper.find_element_by_class_name("quote-price").text
        quote_volume = quote_wrapper.find_element_by_class_name(
            "quote-volume").text

        print("\n")
        print("Quote Name: " + quote_name)
        print("Quote Price: " + quote_price)
        print("Quote Volume: " + quote_volume)
        print("\n")

        convertToJson(quote_name,quote_price,quote_volume)


 def convertToJson(quote_name,quote_price,quote_volume):
        quotesArr = []
        quoteObject = {
            "Name": quote_name,
            "Price": quote_price,
            "Volume": quote_volume
        }
        quotesArr.append(quoteObject)

        with open('trendingQuoteData.json', 'w') as outfile:
            json.dump(quotesArr, outfile)

2 个答案:

答案 0 :(得分:1)

您需要将变量quotesArr设置为全局变量,将其放在函数外部,并在完成时编写json。

quotesArr = []
def convertToJson(quote_name,quote_price,quote_volume):
    quoteObject = {
        "Name": quote_name,
        "Price": quote_price,
        "Volume": quote_volume
    }
    quotesArr.append(quoteObject)

def trendingBot(url, browser):
    browser.get(url)
    trending = getTrendingQuotes(browser)
    for trend in trending:
        getStockDetails(trend, browser)
    # requests finished, write json to file
    with open('trendingQuoteData.json', 'w') as outfile:
        json.dump(quotesArr, outfile)

答案 1 :(得分:0)

import json

a = json.loads(jsonStringA)
b = json.loads(jsonStringB)
c = dict(a.items() + b.items())
# or c =  dict(a, **b)