使用BeautifulSoup比较python中的列表

时间:2017-04-12 17:29:41

标签: python beautifulsoup

以下代码旨在浏览网页中的标记('','强'以及''标记在特定' li'条目)。如果标签在列表中找到(可以在代码中找到)那么“a = class = vote-description__evidence' tag被添加到另一个列表中 - 否则将0添加到此列表中。代码可以在这里找到:

import urllib2
from BeautifulSoup import *

def votedescget(link):
    response = urllib2.urlopen(link)
    html = response.read()
    soup = BeautifulSoup(html)
    desc = soup.findAll('ul',{'class':"vote-descriptions"})
    readVotes = open("categories.txt","r")
    #descList = []

    #for line in readVotes.read().splitlines():
        #descList.append(line)

    resultsList = []
    descList = ['<b>gay rights</b>', '<b>smoking bans</b>', '<b>hunting ban</b>', '<b>marriage</b>', '<b>equality and human rights</b>', '<b>assistance to end their life</b>', '<b>UK military forces</b>', '<b>Iraq war</b>', '<strong>investigations</strong>', '<b>Trident</b>', '<b>EU integration</b>', '<b>EU</b>', '<b>Military Covenant</b>', '<b>right to remain for EU nationals</b>', '<b>UK membership of the EU</b>', '<b>military action against <a href="https://en.wikipedia.org/wiki/Islamic_State_of_Iraq_and_the_Levant">ISIL (Daesh)</a></b>', '<b>housing benefit</b>', '<b>welfare benefits</b>', '<b>illness or disability</b>', '<b>council tax</b>', '<b>welfare benefits</b>', '<b>guaranteed jobs for young people</b>', '<b>income tax</b>', '<b>rate of VAT</b>', '<b>alcoholic drinks</b>', '<b>taxes on plane tickets</b>', '<b>fuel for motor vehicles</b>', '<b>income over &pound;150,000</b>', '<b>occupational pensions</b>', '<b>occupational pensions</b>', '<b>banker&rsquo;s bonus tax</b>', '<b>taxes on banks</b>', '<b>mansion tax</b>', '<b>rights for shares</b>', '<b>regulation of trade union activity</b>', '<b>capital gains tax</b>', '<b>corporation tax</b>', '<b>tax avoidance</b>', '<b>incentives for companies to invest</b>', '<b>high speed rail</b>', '<b>private patients</b>', '<b>NHS</b>', '<b>foundation hospitals</b>', '<b>smoking bans</b>', '<b>assistance to end their life</b>', '<b>autonomy for schools</b>', '<b>undergraduate tuition fee</b>', '<a href="https://en.wikipedia.org/wiki/Academy_(English_school)">academy schools</a>', '<b>financial support</b>', '<b>tuition fees</b>', '<b>funding of local government</b>', '<b>equal number of electors</b>', '<b>fewer MPs</b>', '<b>transparent Parliament</b>', '<a href="https://en.wikipedia.org/wiki/Proportional_representation">proportional system</a>', '<strong>wholly elected</strong>', '<b>taxes on business premises</b>', '<b>campaigning by third parties</b>', '<b>fixed periods between parliamentary elections</b>', '<b>hereditary peers</b>', '<b>more powers to the Welsh Assembly</b>', '<b>more powers to the Scottish Parliament</b>', '<b>powers for local councils</b>', '<b>over laws specifically impacting their part of the UK</b>', '<b>voting age</b>', '<b>stricter asylum system</b>', '<b>intervene in inquests</b>', '<b>ID cards</b>', '<b>Police and Crime Commissioners</b>', '<b>retention of information about communications</b>', '<b>enforcement of immigration rules</b>', '<b>mass surveillance</b>', '<b>merging police and fire services</b>', '<b>prevent climate change</b>', '<b>fuel for motor vehicles</b>', '<b>forests</b>', '<b>taxes on plane tickets</b>', '<b>electricity generation</b>', '<b>culling badgers</b>', '<b>hydraulic fracturing (fracking)</b>', '<b>high speed rail</b>', '<b>bus services</b>', '<b>rail fares</b>', '<b>fuel for motor vehicles</b>', '<b>taxes on plane tickets</b>', '<b>publicly owned railway system</b>', '<b>secure tenancies for life</b>', '<b>market rent to high earners renting a council home</b>', '<b>regulation of gambling</b>', '<b>civil service redundancy payments</b>', '<b title="Including voting to maintain them">anti-terrorism laws</b>', '<b>Royal Mail</b>', '<b>pub landlords rent-only leases</b>', '<b>legal aid</b>', '<b>courts in secret sessions</b>', '<b>register of lobbyists</b>', '<b>no-win no fee cases</b>', '<b>letting agents</b>', '<b><a href="http://webarchive.nationalarchives.gov.uk/20100527091800/http://programmeforgovernment.hmg.gov.uk/">Conservative - Liberal Democrat Coalition Agreement</a></b>']
    #print descList

    for line in desc:
        li_list = line.findAll('li')
        for li in li_list:
            if len(li.findAll('b')) == 1:
                if li.find('b') in descList:
                    resultsList.append(str(li.find('a',{'class':"vote-description__evidence"})))
                    print li.find('a',{'class':"vote-description__evidence"})
            elif len(li.findAll('b')) == 2:
                print li.findAll('b')[1]
                if li.findAll('b')[1] in descList:
                    resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
                    print li.find('a',{'class':"vote-description__evidence"})
            elif li.find('strong') in descList:
                resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
                print li.find('a',{'class':"vote-description__evidence"})
            elif li.find('a') in descList:
                resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
                print li.find('a',{'class':"vote-description__evidence"})
            else:
                resultsList.append('0')

  print resultsList 

votedescget(&#34; https://www.theyworkforyou.com/mp/10001/diane_abbott/hackney_north_and_stoke_newington/votes&#34)

通常,列表是从文件中以编程方式创建的,但为了方便起见,我只是将其作为变量包含在内。出于某种原因,我在运行此代码时获得的结果如下:

<b>assistance to end their life</b>
<b>council tax</b>
<b>assistance to end their life</b>
<b>over laws specifically impacting their part of the UK</b>
<b>electricity generation</b>
<b>no-win no fee cases</b>
<b>letting agents</b>
['0', '0', '0', '0']

有谁能告诉我为什么会这样,或者如何解决?我期待的是一个零的列表,其中散布着在descList中找到标签的结果,但这并不是发生了什么。

1 个答案:

答案 0 :(得分:1)

在您的比较中,您正在检查if li.find('b') in descList: 您是否测试了是否可以通过这种方式将可导航字符串与字符串进行比较?美丽的汤返回一个可导航的字符串而不是字符串,这就是为什么你在将它附加到你的列表之前键入它为一个字符串;但是,在进行此比较之前,您没有输入它。

相关问题