Python 2.7.13 - 抓取链接 - 以下链接 - 抓取内容

时间:2017-03-06 19:43:06

标签: python web-scraping beautifulsoup

我正在建立一个网络刮刀,它将抓住所有英国麦当劳地址,邮政编码和电话号码。我使用的是聚合器而不是麦当劳网站。

https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/

我已经“借用”了一些代码并重新利用了它 - 但是我遇到了错误。希望有人可以告诉我我做错了什么

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/"

def get_category_links(section_url):
    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    boccat = soup.find("tr")
    category_links = [BASE_URL + tr.a["href"] for tr in boccat.findAll("h2")]
    return category_links

def get_restaurant_details(category_url):
    html = urlopen(category_url).read()
    soup = BeautifulSoup(html, "lxml")
    streetAddress = soup.find("span", "streetAddress").string
    addressLocality = [h2.string for h2 in soup.findAll("span", "addressLocality")]
    addressRegion = [h2.string for h2 in soup.findAll("span", "addressRegion")]
    postalCode = [h2.string for h2 in soup.findAll("span", "postalCode")]
    phoneNumber = [h2.string for h2 in soup.findAll("td", "b")]
    return {"streetAddress": streetAddress,
            "addressLocality": addressLocality,
            "addressRegion": addressRegion,
            "phoneNumber": phoneNumber}

错误如下所示:

File "<stdin>", line 6
addressRegion = [h2.string for h2 in soup.findAll("span", "addressRegion")]

IndentationError: unexpected indent
>>>    postalCode = [h2.string for h2 in soup.findAll("span", "postalCode")]
  File "<stdin>", line 6
    postalCode= [h2.string for h2 in soup.findAll("span", "postalCode")]

IndentationError: unexpected indent
>>>    phoneNumber = [h2.string for h2 in soup.findAll("td", "b")]
  File "<stdin>", line 6
    phoneNumber= [h2.string for h2 in soup.findAll("td", "b")]

IndentationError: unexpected indent
>>>    return {"streetAddress": streetAddress
  File "<stdin>", line 1
    return {"streetAddress": streetAddress

IndentationError: unexpected indent
>>>    return {"addressLocality": streetAddress
  File "<stdin>", line 1
    return {"streetAddress": streetAddress

IndentationError: unexpected indent
>>>    return {"addressRegion s":treetRegion
  File "<stdin>", line 1
    return {"streetRegion": streetRegion

IndentationError: unexpected indent
>>>    return {"phoneNumber":treetRegion
  File "<stdin>", line 1
    return {"streetRegion": streetRegion

IndentationError: unexpected indent
>>>

提前致谢

1 个答案:

答案 0 :(得分:0)

复制/粘贴代码时出现此错误。尝试按照正确的缩进重写。