从多个网址抓取

时间:2018-01-28 20:01:38

标签: html python-3.x web-scraping

sudo yarn global add polymer-cli

yarn global v1.3.2
[1/4] Resolving packages...
warning polymer-cli > babel-preset-es2015@6.24.1:   Thanks for using Babel: we recommend using babel-preset-env now: please read babeljs.io/env to update! 
warning polymer-cli > bower@1.8.2: ...psst! Your project can stop working at any moment because its dependencies can change. Prevent this by migrating to Yarn: https://bower.io/blog/2017/how-to-migrate-away-from-bower/
warning polymer-cli > github@7.3.2: 'github' has been renamed to '@octokit/rest' (https://git.io/vNB11)
warning polymer-cli > polyserve > @types/assert@0.0.29: See https://github.com/DefinitelyTyped/DefinitelyTyped/issues/12826
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
success Installed "polymer-cli@1.5.7" with binaries:
      - polymer
Done in 61.35s.

这是我成功的代码,用于计算单个网址的总字数。现在的挑战是计算csv文件中的单词数量,该文件在一列中有500个URL。我尝试了很多但失败了。

1 个答案:

答案 0 :(得分:0)

对于解析csv,有csv库,所以你只需要:

import csv

with open('yourfile.csv', 'rb') as csvfile:
  reader = csv.reader(csvfile) # add delimiter ='',quotechar ='' depending on csv structure
  for URL in reader: 
    page = requests.get(URL)
    #the rest of what you want to do with the content of URL

您可以将当前代码放入函数中,然后为每个已解析的URL调用它。