从保管箱链接解析 .xls 文件

时间:2021-06-30 12:50:39

标签: python excel pandas dropbox

我正在尝试从保管箱链接 (https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx) 解析表格。这是一个 .xlsx 表,到目前为止我已经尝试了两种方法

方法一

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'

tables = pd.read_html(codeID_url)
df_codeID = tables[0]

给予

ValueError: No tables found

这是有道理的,因为最后,我不是从 html 页面解析表格。上面的命令对于本页 (https://www.ecdc.europa.eu/en/covid-19/variants-concern) 中的表格非常有效

方法二

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
data = pd.read_excel(codeID_url,'TestResultCodelistVoC')

给出:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<!DOCTYP'

我确实在同一错误 here 上找到了一个主题,尽管所有答案都在处理 local .xls 文件,在我的情况下,我正在尝试解析一个网页/链接,最终是一个 .xls 文件。

我也遇到过一个使用 dropbox token 的解决方案,不过如果可能的话,我首先想尝试在没有 Dropbox 帐户的情况下下载上述表格。

1 个答案:

答案 0 :(得分:0)

在网址末尾添加 ?dl=1

>>> import pandas as pd
>>>
>>> url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx?dl=1'
>>> df = pd.read_excel(url)
>>> print(df)
             Codelistname  Codesystem name  ...                                     Short label DE 1st Release
0   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V1         NaN
1   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V2         NaN
2   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V3         NaN
3   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P1         NaN
4   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P2         NaN
5   TestResultCodelistVoC              NaN  ...                Confirmed not one of the listed VOC         NaN
6   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V1         NaN
7   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V2         NaN
8   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V3         NaN
9   TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P1         NaN
10  TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P2         NaN
11  TestResultCodelistVoC              NaN  ...                          Compatible with 501Y.V2-3         NaN
12  TestResultCodelistVoC              NaN  ...                              Compatible with a VOC         NaN
13  TestResultCodelistVoC              NaN  ...                             Confirmed MinkCluster5         NaN
14  TestResultCodelistVoC              NaN  ...                       Compatible with MinkCluster5         NaN
15  TestResultCodelistVoC              NaN  ...                        Not compatible with 501Y.V1         NaN
16  TestResultCodelistVoC              NaN  ...                      Not compatible with 501Y.V2-3         NaN
17  TestResultCodelistVoC              NaN  ...  No compatibility with VOC detected (VOC not fu...         NaN
18  TestResultCodelistVoC              NaN  ...                           Other variant of concern         NaN

[19 rows x 12 columns]
>>>
相关问题