我无法使用以下代码将列表转换为数据框:
from bs4 import BeautifulSoup
import requests
page = requests.get("http://investors.morningstar.com/ownership/shareholders-overview.html?t=AAPL®ion=idn&culture=en-US&ownerCountry=USA")
soup = BeautifulSoup(page.content, 'lxml')
quote = soup.find('table', class_='r_table2 text2 print97').find_all('tr')
for row in quote:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
print (cols)
输出:
['Name', '', 'Ownership TrendPrevious 8 Qtrs', 'Shares', 'Change', '% TotalShares Held', '% TotalAssets', '', 'Date']
['']
['Russell Inv Tax-Managed DI Large Cap SMA', '', 'Premium', '15,981,694,820', '15,981,694,820', '95.20', '6.7', '', '12/31/2020']
['']
['Vanguard Total Stock Market Index Fund', '', 'Premium', '432,495,433', '1,210,943', '2.58', '5.31', '', '01/31/2021']
如何将其转成dataframe,并在第一个索引中输入列名,然后将之后的所有数据输入到dataframe的内容中,
['Name', '', 'Ownership TrendPrevious 8 Qtrs', 'Shares', 'Change', '% TotalShares Held', '% TotalAssets', '', 'Date']
最终结果在数据框中。
答案 0 :(得分:3)
很简单。只需使用 DataFrame
构造函数。
from bs4 import BeautifulSoup
import pandas as pd
import requests
page = requests.get("http://investors.morningstar.com/ownership/shareholders-overview.html?t=AAPL®ion=idn&culture=en-US&ownerCountry=USA")
soup = BeautifulSoup(page.content, 'lxml')
quote = soup.find('table', class_='r_table2 text2 print97').find_all('tr')
data = []
for row in quote:
data.append([x.text.strip() for x in row.find_all('td')])
df = pd.DataFrame(data[1:], columns=data[0])