如何从csv文件的第一列中读取整个URL

时间:2019-03-24 20:53:34

标签: python url

我正在尝试从csv文件的第一列中读取网址。在csv文件中,我总共要读取6051个URL。为此,我尝试了以下代码:

    urls = []
    with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
        blogurl = csv.reader(csvfile)
        for row in blogurl:
            row = row[0]
            print(row)

len(row)

但是,显示的url数量只有65个。我不知道为什么url的总数与csv文件看起来不同。

有人可以帮助我弄清楚如何从csv文件中读取所有网址(总计6051)吗?

要读取csv文件中的所有网址,我还尝试了几种不同的代码,导致相同数量的网址(即65个网址)或失败,例如: 1)

    openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
    r = csv.reader(openfile)
    for i in r:
        #the urls are in the first column ... 0 refers to the first column
        blogurls = i[0]
        print (blogurls)
    len(blogurls)

2)

    urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
    with closing(requests.get(urls, stream = True)) as r:
        reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
        for row in reader:
            print(row)
            len(row)

3)

    with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
        lines = csv.reader(csvfile)
        for i, line in enumerate(lines):
            if i == 0:
        for line in csvfile:
            print(line[1:])
            len(line)

4)和

    blogurls = []
    with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
        r = csv.reader(csvfile)
        for i in r:
            blogurl = i[0]
            r = requests.get(blogurl)
            blogurls.append(blogurl)

    for url in blogurls:
        page = urlopen(url[0]).read()
        soup = BeautifulSoup(page, "html.parser")
    len(blogurls)

我希望输出的csv文件中最初收集的是6051个网址,而不是65个网址。

阅读所有网址后,我将从每个网址中抓取文字数据。我应该使用所有6051个网址获得以下文本数据。请点击以下图片链接:

the codes and the outcomes based on 65 urls so far

1 个答案:

答案 0 :(得分:0)

以下两种方法对我有用:

import requests

r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()

print(len(urls))  # Returns 6051

import csv
import requests
from io import StringIO

r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]

print(len(urls))  # Returns 6051