将Python Writerow转换为CSV(从For循环输出到列)

时间:2018-07-07 03:32:01

标签: python csv beautifulsoup python-requests

我认为我可以根据我收到的关于类似问题的最后一个答案自己弄清楚这个问题,但我还是空白了。

我正在制作一个Python 3网络抓取工具,用于从The Score网站上抓取MLB得分。我想要的是将相关信息以与网站上显示的布局完全相同的格式输出到CSV。用于此示例的URL为:

https://www.thescore.com/mlb/events/date/2018-06-29

...这是我当前的代码(我知道这是不正确的,但是我已经尝试了几种不同的解决方案,但是没有一个给我想要的输出,尝试了row.append等。当前要从CSV导入的URL列表,因为我希望它循环浏览Urls列表,但这是一个链接作为示例)

from bs4 import BeautifulSoup
import requests
import csv
from csv import reader, writer

with open('DailyResultsURLS.csv', newline='') as f_urls, open('DailyResultsOutput.csv', 'w', newline='') as f_output:
    csv_urls = csv.reader(f_urls)
    csv_output = csv.writer(f_output, delimiter=',')
    csv_output.writerow(['Date', 'Away Team', 'Home Team', 'Away Score', 'Home Score', 'Final/Extra Innings'])


    for line in csv_urls:
        page = requests.get(line[0]).text
        soup = BeautifulSoup(page, 'html.parser')
        date = soup.find('div', {'class' : 'events__date--1OuzN'})
        teams = soup.findAll('span', {'class' : 'EventCard__title--DY0la'})
        scores = soup.findAll('div', {'class' : 'col-xs-2 EventCard__rightColumn--7jlDP'})
        final = soup.findAll('div', {'class' : 'col-xs-4 col-sm-3 EventCard__rightColumn--7jlDP'})

        for d in range(len(date)):
            csv_output.writerow([[date.text] + [teams[r1].text for r1 in range(len(teams))] + [scores[r2].text for r2 in range(len(scores))] + [final[f3].text for f3 in range(len(final))]])

我还附上了我的“所需” DailyResultsOutput.csv输出文件的外观图片。

enter image description here

要提到的一件事是,它在网站上为每个特定游戏说“最终”的地方,有时可以更改为诸如“最终(13)”之类的名称,或者说该游戏进入了许多局面,因此代码可以不仅要输入字符串“ Final”,还需要从站点获取值。

如您所见,当前它只包含所有必要信息的一行,但是我希望将其布置在适当的列标题下。再次感谢您提供的一百万美元。让我知道我是否错过任何事情。

1 个答案:

答案 0 :(得分:0)

以下代码是一种处理方式。

示例代码:

from requests import get
from csv import reader, writer
from bs4 import BeautifulSoup

url = 'https://www.thescore.com/mlb/events/date/2018-06-29'

headers = ['Date', 'Away Team', 'Home Team', 'Away Score', 'Home Score', 'Final/Extra Innings']

# open writing file with context manager
with open('DailyResultsOutput.csv', 'w') as output:
    csv_writer = writer(output)
    response = get(url)

    # check if request passed
    # could do more error checking here if you wanted to
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        date = soup.find('div', attrs={'class' : 'events__date--1OuzN'}).text
        teams = soup.findAll('span', attrs={'class' : 'EventCard__title--DY0la'})
        scores = soup.findAll('div', attrs={'class' : 'col-xs-2 EventCard__rightColumn--7jlDP'})
        finals = soup.findAll('div', attrs={'class' : 'col-xs-4 col-sm-3 EventCard__rightColumn--7jlDP'})

        # pair up teams, scores and finals
        lines = list(zip(teams, scores, finals))

        # pair up every two teams
        for away, home in zip(lines[::2], lines[1::2]):

            # extract string items
            a_team, a_score, final, h_team, h_score, _ = (x.text for x in away + home)

            # reorder and write row
            row = date, a_team, h_team, a_score, h_score, final
            csv_writer.writerow(row)

输出:

DailyResultsOutput.csv

Fri June 29,NY Mets,Miami,2,8,Final
Fri June 29,Houston,Tampa Bay,2,3,Final
Fri June 29,Chi White Sox,Texas,3,11,Final
Fri June 29,Colorado,LA Dodgers,3,1,Final
Fri June 29,Minnesota,Chi Cubs,6,10,Final
Fri June 29,LA Angels,Baltimore,7,1,Final
Fri June 29,Boston,NY Yankees,1,8,Final

说明:

  • 打开文件进行写入。
  • 使用requests库从具有GET请求的页面中提取html。
  • 使用BeautifulSoup库解析html标签。
  • 使用csv库处理项目并写入文件。