Question

我是python的初学者，一直在我的硕士论文中使用它进行游戏行业的文本分析。我一直在尝试从几个游戏评论家网站上获取评论。

我在代码中使用了一个URL列表来抓取评论，并获得了成功。不幸的是，我无法将每个评论写在一个单独的文件中。当我写文件时，或者我只收到列表中最后一个URL到所有文件的评论，或者更改缩进后所有文件中的所有评论。以下是我的代码。请问这里有什么问题吗？

from bs4 import BeautifulSoup
import requests

urls= ['http://www.playstationlifestyle.net/2018/05/08/ao-international-tennis-review/#/slide/1',
'http://www.playstationlifestyle.net/2018/03/27/atelier-lydie-and-suelle-review/#/slide/1',
'http://www.playstationlifestyle.net/2018/03/15/attack-on-titan-2-review-from-a-different-perspective-ps4/#/slide/1']  

for url in urls:
    r=requests.get(url).text
    soup= BeautifulSoup(r, 'lxml')
for i in range(len(urls)):
    file=open('filename%i.txt' %i, 'w')    
    for article_body in soup.find_all('p'):
        body=article_body.text
        file.write(body)
    file.close()

Answer 1

我认为您只需要一个for循环。如果我理解正确，那么您只想遍历urls并为每个文件存储一个单独的文件。

因此，我建议删除第二条for语句。不过，您确实需要修改for url in urls，以获取可以用于i的当前URL的唯一索引，并且可以为此使用enumerate。

您的单个for语句将变为：

for i, url in enumerate(urls):

我自己还没有对此进行测试，但是我认为这应该可以解决您的问题。

Answer 2

我完全相信您是python的初学者。在解释之前，我先张贴正确的内容。

for i,url in enumerate(urls):
    r = requests.get(url).text
    soup = BeautifulSoup(r, 'lxml')
    file = open('filename{}.txt'.format(i), 'w')
    for article_body in soup.find_all('p'):
        body = article_body.text
        file.write(body)
    file.close()

i receive only the review from the last URL in the list to all the files

的原因

一个变量代表一个值，因此在for循环完成后，您将获得最后一个结果（第三个结果）。第一个和第二个结果将被覆盖

for url in urls:
    r = requests.get(url).text
    soup = BeautifulSoup(r, 'lxml')

刮擦来自多个网站的文本并分别保存在文本文件中

2 个答案: