你会如何简化这个程序?蟒蛇

时间:2016-06-17 14:47:56

标签: python python-2.7

我写了这个程序,其目的是访问链接列表中的第18个链接,然后在新页面上再次访问第18个链接。

这个程序按预期工作,但它有点重复和不优雅。

我想知道如果不使用任何功能,你是否有任何关于如何简化它的想法。如果我想重复这个过程10或100次,这将变得非常长。

感谢您的任何建议!

# Note - this code must run in Python 2.x and you must download
# http://www.pythonlearn.com/code/BeautifulSoup.py
# Into the same folder as this program

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
if len(url) < 1 :
    url='http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('a')
urllist = list()
count = 0
loopcount = 0
for tag in tags:
    count = count + 1
    tg = tag.get('href', None)
    if count == 18:
        print count, tg
        urllist.append(tg)


url2 = (urllist[0])
html2 = urllib.urlopen(url2).read()
soup2 = BeautifulSoup(html2)

tags2 = soup2('a')
count2 = 0
for tag2 in tags2:
    count2 = count2 + 1
    tg2 = tag2.get('href', None)
    if count2 == 18:
        print count2, tg2
        urllist.append(tg2)

1 个答案:

答案 0 :(得分:2)

这就是你能做的。

import urllib
from BeautifulSoup import *

url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'

html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)

tags = soup('a')
url_retr1 = tags[17].get('href', None)

html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)

tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)