从文本文件管理类似JSON的数据

时间:2015-03-12 04:06:02

标签: python json

Python新手所以请你好。我在一行上有一个包含类似JSON数据的.txt文件:

{"marketing_package_url": "http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf", "title": "TRAILS END MARKETPLACE", "location": "OREGON CITY, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf", "title": "YUKON VILLAGE", "location": "YUKON, OK"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf", "title": "SOUTH POINT PLAZA", "location": "EVERETT, WA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf", "title": "HOME DEPOT - BELLINGHAM", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf", "title": "MUNCIE MARKETPLACE", "location": "MUNCIE, IN"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "AUGUSTA, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "GAINESVILLE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf", "title": "TEXAS STRIP CENTER PORTFOLIO", "location": "VARIOUS LOCATIONS, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf", "title": "ARNEY RETAIL CENTER", "location": "WOODBURN, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LAGRANGE, GA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "LYNCHBURG, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ROANOKE, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "ASHLAND, VA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf", "title": "WALMART NEIGHBORHOOD MARKET", "location": "OKLAHOMA CITY, OK"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf", "title": "SAN ANGELO MARKETPLACE", "location": "SAN ANGELO, TX"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf", "title": "KEIZER VILLAGE", "location": "KEIZER, OR"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf", "title": "BONANZA SHOPPING CENTER", "location": "CLOVIS, CA"}{"marketing_package_url": "http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf", "title": "WALGREENS", "location": "BELLINGHAM, WA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf", "title": "THE ORCHARD CENTER", "location": "TEHACHAPI, CA"}{"marketing_package_url": "http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf", "title": "CINETOPIA", "location": "VANCOUVER, WA"}

我要做的是将营销包URL仅打包到脚本中的列表中,以便它出现如下:

list [0] = http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf

list [1] = http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf

list [2] = ...

我已经尝试过json.loads,但是会给出错误,即这些行还有额外的数据。我相信这是因为它是一个.txt文件而且格式不像JSON。任何帮助非常感谢谢谢。

编辑:json对象都在一行上。这是我第一次尝试它,尝试拆分各个对象,然后重新加入它们:

import json

result = []
with(open("properties.txt", "rU")) as f:
    j = f.next()
    jlist = len(jlist)
    print len(jlist)
    jlist = [jlist[0][1:] + "}"] + [ "{" + x + "}" for x in jlist[1:-1]] + ["{" + jlist[-1][:2]]
    for x in jlist:
        result.append(json.loads(x))

for x in result:
    print(x['title'])

2 个答案:

答案 0 :(得分:1)

这是一个函数,它接受包含任意数量的JSON对象的字符串相互运行,并将解析每个对象并逐个产生结果:

import json
def get_json_objects(s):
    d = json.JSONDecoder()
    idx = 0
    while idx < len(s):
        j, idx = d.raw_decode(s, idx=idx)
        yield j

示例:

>>> list(get_json_objects("[1,2][3,4]{}"))
[[1, 2], [3, 4], {}]

所以你可以像这样使用它:

urls = [j["marketing_package_url"] for j in get_json_objects(open("data.txt").read())]

答案 1 :(得分:0)

https?:\/\/[^"]+

如果json无效,请尝试使用re.findall。请参阅演示。

https://regex101.com/r/iS6jF6/7

import re
p = re.compile(r'https?:\/\/[^"]+', re.IGNORECASE | re.MULTILINE)
test_str = "{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/TrailsEndMarketplaceExecSummary.pdf\", \"title\": \"TRAILS END MARKETPLACE\", \"location\": \"OREGON CITY, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Yukon-Village-YukonOK.pdf\", \"title\": \"YUKON VILLAGE\", \"location\": \"YUKON, OK\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/SouthPointPlazaExecSummary-CONFI.pdf\", \"title\": \"SOUTH POINT PLAZA\", \"location\": \"EVERETT, WA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/HomeDepotBellinghamExecutiveSummary.pdf\", \"title\": \"HOME DEPOT - BELLINGHAM\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Muncie-Marketplace-MuncieIN.pdf\", \"title\": \"MUNCIE MARKETPLACE\", \"location\": \"MUNCIE, IN\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-AugustaGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"AUGUSTA, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-GainesvilleGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"GAINESVILLE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Texas-Strip-Center-Portfolio.pdf\", \"title\": \"TEXAS STRIP CENTER PORTFOLIO\", \"location\": \"VARIOUS LOCATIONS, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/ArneyRetailCenterExecSummary.pdf\", \"title\": \"ARNEY RETAIL CENTER\", \"location\": \"WOODBURN, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-NeighborhoodMarket-LaGrangeGA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LAGRANGE, GA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-LynchburgVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"LYNCHBURG, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-RoanokeVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ROANOKE, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-AshlandVA.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"ASHLAND, VA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Walmart-Neighborhood-Market-OklahomaCityOK.pdf\", \"title\": \"WALMART NEIGHBORHOOD MARKET\", \"location\": \"OKLAHOMA CITY, OK\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/San-Angelo-Marketplace-SanAngeloTX.pdf\", \"title\": \"SAN ANGELO MARKETPLACE\", \"location\": \"SAN ANGELO, TX\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/KeizerVillageExecSummary.pdf\", \"title\": \"KEIZER VILLAGE\", \"location\": \"KEIZER, OR\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Bonanza-Shopping-Center-ClovisCA.pdf\", \"title\": \"BONANZA SHOPPING CENTER\", \"location\": \"CLOVIS, CA\"}{\"marketing_package_url\": \"http://www.capitalpacific.com/inquiry/WalgreensBellinghamExecSummary.pdf\", \"title\": \"WALGREENS\", \"location\": \"BELLINGHAM, WA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/The-OrchardCenter-TehachapiCA.pdf\", \"title\": \"THE ORCHARD CENTER\", \"location\": \"TEHACHAPI, CA\"}{\"marketing_package_url\": \"http://cp.capitalpacific.com/Properties/Cinetopia-VancouverWA.pdf\", \"title\": \"CINETOPIA\", \"location\": \"VANCOUVER, WA\"}"

re.findall(p, test_str)