合并通过网络抓取获得的数据帧

时间:2016-08-07 03:20:32

标签: python selenium pandas dataframe

我有一个从网站上删除表格的代码,并将其读入pandas Dataframe。但是,由于网站的设计方式,这是通过for循环完成的。因此,表格都标有相同的name即:它们标记在df名称下

代码

soup = bs4.BeautifulSoup(driver.page_source, "html.parser")
    for thead in soup.select(".data-point-container table thead"):
        tbody = thead.find_next_sibling("tbody")

        table = "<table>%s</table>" % (str(thead) + str(tbody))

        df = pandas.read_html(str(table))[0]

        print(df)
        print('-------------')

结果

     Table1   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
-------------

     Table2   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
-------------

     Table3   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
-------------

     Table4   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
6    item7    value1   value2   value3   value4   value5
7    item8    value1   value2   value3   value4   value5

我有没有办法将所有Dataframe连接/合并到一个Dataframe中?

1 个答案:

答案 0 :(得分:1)

如果您需要做的就是合并多个DataFrame,您只需在列表中收集它们,然后使用pd.concat合并它们。

这样的事情应该有效:

dataframes = []

for thread in soup.select(...):

    #your scraper logic here

    df = pandas.read_html(...)
    dataframes.append(df)

pd.concat(dataframes)

这有帮助吗?