python:如何改进我的代码片段以按顺序解析元素

时间:2013-05-24 14:59:19

标签: python

我希望找到我希望在html块中找到的所有元素,然后对其进行排序,以便所有每个标记都将其放在我的列表中:('h3','a','img')

我想知道是否有更好的方法以更漂亮的方式解决我的问题并且更容易扩展(添加更多标签)。

Ex:所以我可以将这个列表发送给一个函数而不考虑它。

以下是运行我的代码段后的结果:

[
('text 1', '/url1', '/img1.png'),
('text 2', '/url2', '/img3.png'),
('text 3', '/url3', '/img3.png'),
]

片段:

def parse_element_tag(el):
    #<class 'lxml.html.HtmlElement'>
    dict = {'a': (el.get('href'), 1), 'img': (el.get('src'), 2), 'h3': (el.text, 0)}
    return dict[el.tag]

requests_cache.configure('test', expire_after=900)
r = readUrl('http://www.svtplay.se/program')
l = lxml.html.fromstring(r.text)
lst = []
for el in l.cssselect('div ul.svtGridBlock li div a'):
    #lst.append(sorted([parse_element_tag(i) for i in el.iter()], key=lambda val: val[1]))
    lst.append(
               tuple([i[0] for i in sorted(
                      [parse_element_tag(i) for i in el.iter() if i.tag in ('a', 'img', 'h3')], key=lambda val: val[1]
                      )]
               ))

1 个答案:

答案 0 :(得分:0)

使用xpath

def parse_element_tag(el, path):
    matched = el.xpath(path)
    if matched:
        return matched[0].strip()
    return ''

lst = []
paths = './/h3/text()', './@href', './/img/@src'
for el in l.cssselect('div ul.svtGridBlock li div a'):
    lst.append(tuple(parse_element_tag(el, path) for path in paths))

>>> for x in lst: print(x)
('Barn', '/kategorier/barn', '/public/images/categories/kat-barn.png')
(u'Dokument\xe4r', '/kategorier/dokumentar', '/public/images/categories/kat-dokumentar.png')
('Film & Drama', '/kategorier/filmochdrama', '/public/images/categories/kat-filmochdrama.png')
(u'Kultur & N\xf6je', '/kategorier/kulturochnoje', '/public/images/categories/kat-kulturochnoje.png')
('Nyheter', '/kategorier/nyheter', '/public/images/categories/kat-nyheter.png')
(u'Samh\xe4lle & Fakta', '/kategorier/samhalleochfakta', '/public/images/categories/kat-samhalleochfakta.png')
('Sport', '/kategorier/sport', '/public/images/categories/kat-sport.png')
('', '/kategorier/oppetarkiv', '/public/images/categories/kat-oppetarkiv.png')
相关问题