我怎样才能更有效地刮掉这张桌子?

时间:2016-05-15 18:23:29

标签: python screen-scraping

好的,我已经建立了一个程序来刮取雅虎财务。我想要一定股票的历史价格。然后我想将它写入excel电子表格。它按照预期的方式做了所有事情,但它给了我整个页面上的所有数据!我只需要表格中的数据。感谢。

import urllib
import urllib.request
from bs4 import BeautifulSoup
import os
import requests

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

playerdatasaved=""
soup = make_soup("https://finance.yahoo.com/q/hp?s=USO+Historical+Prices")
for record in soup.findAll('tr'):
playerdata=""
for data in record.findAll('td'):
   playerdata=playerdata+","+data.text
if len(playerdata)!=0:
    playerdatasaved = playerdatasaved + "\n" + playerdata[1:]

header="Open,Close,High,Low"
file = open(os.path.expanduser("Uso.csv"),"wb")
file.write(bytes(header, encoding="ascii",errors='ignore'))
file.write(bytes(playerdatasaved, encoding="ascii",errors='ignore'))

print(playerdatasaved)

1 个答案:

答案 0 :(得分:0)

获取数据表:

soup = make_soup("https://finance.yahoo.com/q/hp?s=USO+Historical+Prices")
table = [[cell.text for row in soup.findAll('tr')] for cell in soup.findAll('td')]

将数据表写出到文件中:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(table)