python3,读取解压缩文件的内容时出现UnicodeDecodeError

时间:2018-11-13 16:14:59

标签: python pandas zip

我有一些代码可以下载一些压缩的csv文件,将其解压缩,然后将数据连接到单个数据框中。问题是我得到了错误

import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup


def findZipLinks(url):
    r = requests.get(url)
    bs = BeautifulSoup(r.content, features="html.parser")
    links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', {"class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x"}) if "zip" in a.get("data-link")]
    return links


exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs = []
for exit_url in exits:
    r = requests.get(exit_url)
    zipfile = ZipFile(BytesIO(r.content))
    dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))

pd.concat(df for df in dfs).reset_index(drop=True)

问题是我在附加行上收到错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte。我尝试调用.decode('utf-8')和.decode('windows-1252'),但收到类似的错误。谁能帮助我找出问题所在?

1 个答案:

答案 0 :(得分:0)

读取文件时,将读取模式指定为wb

zipfile.open(zipfile.namelist()[0], 'wb')
相关问题