<dl>标记上的KeyError 0

时间:2018-07-30 08:40:53

标签: python python-3.x beautifulsoup html-parsing

我正在尝试解析HTML网站,但是我遇到了KeyError。

代码如下:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"

#PL: otwiera połączenie z wybraną stroną, pobieranie zawartości strony (urllib)
#EN: Opens a connection and grabs url

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing (BeautifulSoup)
page_soup = soup(page_html, "html.parser") #html.parser -> zapisujemy do html, nie np. do xml

#PL: zbiera tabelkę z numerami ofert, kuchnią i innymi danymi o nieruchomości z tabelki
#EN: grabs the data about real estate like kitchen, offer no, etc.
containers = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})

# print(len(containers)) - len(containers) sprawdza ile takich obiektów istnieje na stronie
#PL: Co prawda na stronie jest tylko jedna taka tabelka, ale dla dobra nauki zrobię tak jak gdyby tabelek było wiele.
#EN: There is only one table, but for the sake of knowledge I do the container variable
container = containers[0]
print(len(container.dl))
print(container.dl[0])

这是显示错误的日志。

runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')
36
Traceback (most recent call last):

  File "<ipython-input-70-e826e21c585a>", line 1, in <module>
    runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/bartosz/Pulpit/web_scrap.py", line 30, in <module>
    print(container.dl[0])

  File "/home/bartosz/anaconda3/lib/python3.6/site-packages/bs4/element.py", line 1011, in __getitem__
    return self.attrs[key]

KeyError: 0

len(container.dl)显示dl中有36个。如果我做len(container.dl.dt),它会显示:1。

1 个答案:

答案 0 :(得分:0)

您需要访问元素的内容不是通过直接索引,而是通过.contents属性:

print(container.dl.contents[0])

应该工作。

使用直接索引,您可以访问标签的属性,例如。如果是<dl class="myclass">,则dl['class']将打印myclass

编辑:

要打印container.dl的所有内容:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"

with uReq(my_url) as uClient:
    page_soup = soup(uClient.read(), "html.parser")

container = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})[0]

print(len(container.dl))
print('-' * 80)
for content in container.dl.contents:
    print(content)
    print('-' * 80)

打印(第一行是container.dl.contents的长度):

36
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
<dt>Numer oferty</dt>
--------------------------------------------------------------------------------
<dd>351165</dd>
--------------------------------------------------------------------------------
<dt>Liczba pokoi</dt>
--------------------------------------------------------------------------------
<dd>4</dd>
--------------------------------------------------------------------------------
<dt>Cena</dt>
--------------------------------------------------------------------------------
<dd><span class="tag price">339 600 PLN</span></dd>
--------------------------------------------------------------------------------
<dt>Cena za m2</dt>
--------------------------------------------------------------------------------
<dd>5 096 PLN</dd>
--------------------------------------------------------------------------------
<dt>Powierzchnia</dt>
--------------------------------------------------------------------------------
<dd>66,64 m2</dd>
--------------------------------------------------------------------------------
<dt>Piętro</dt>
--------------------------------------------------------------------------------
<dd>1</dd>
--------------------------------------------------------------------------------
<dt>Liczba pięter</dt>
--------------------------------------------------------------------------------
<dd>6</dd>
--------------------------------------------------------------------------------
<dt>Typ kuchni</dt>
--------------------------------------------------------------------------------
<dd>Aneks</dd>
--------------------------------------------------------------------------------
<dt>Balkon</dt>
--------------------------------------------------------------------------------
<dd>Tak</dd>
--------------------------------------------------------------------------------
<dt>Rodzaj ogrzewania</dt>
--------------------------------------------------------------------------------
<dd>CO miejskie</dd>
--------------------------------------------------------------------------------
<dt>Gorąca woda</dt>
--------------------------------------------------------------------------------
<dd>Wodociąg miejski</dd>
--------------------------------------------------------------------------------
<dt>Rodzaj budynku</dt>
--------------------------------------------------------------------------------
<dd>Wysoki blok</dd>
--------------------------------------------------------------------------------
<dt>Materiał</dt>
--------------------------------------------------------------------------------
<dd>Silikat</dd>
--------------------------------------------------------------------------------
<dt>Rok budowy</dt>
--------------------------------------------------------------------------------
<dd>2019</dd>
--------------------------------------------------------------------------------
<dt>Winda</dt>
--------------------------------------------------------------------------------
<dd>Tak</dd>
--------------------------------------------------------------------------------
<dt>Stan nieruchomości</dt>
--------------------------------------------------------------------------------
<dd>Stan deweloperski</dd>
--------------------------------------------------------------------------------
<dt>Rynek</dt>
--------------------------------------------------------------------------------
<dd>Pierwotny</dd>
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------