Question

我正在使用BS和Mechanize抓取一个网站，我能够让我的刮刀工作一个实例，但我想迭代一个字典，插入一个值，每个类型循环。因为我是python的总蠢话（我的道歉），我无法理解如何做到这一点。

请参阅以下代码获取一个值：

import mechanize
import cookielib
import csv
from bs4 import BeautifulSoup as BS

ids = csv.DictReader(open("csv_to_scrape.csv"))
persons = [person for person in ids]

br = mechanize.Browser()
br2 = mechanize.Browser()
cj = cookielib.LWPCookieJar()

br.set_cookiejar(cj)
br2.set_cookiejar(cj)

br.open('https://www.example.com')

br.select_form(nr=0)
br.form['licenseNumber'] = '012345' #This is the value that comes from my dict. 
br.submit()

for link in br.links(url_regex="/details"):
    req = br.click_link(url=link.url)
    html = br2.open(req).read()

soup = BS(html)
text1 = soup.find('div', {'class':'infobox append-bottom span-11'}).text
text2 = soup.find('div', {'class':'infobox append-bottom'}).text

f = open('output.csv', 'w')
x = '012345'
write_to_file = x + "," + '"""' + text2 + '"""' + "," + '"""' + text1 + '"""' + "\n"
write_to_unicode = write_to_file.encode('utf-8')
print x
f.write(write_to_unicode)
f.close()

我有一个基本的词典，如下所示：

[{'': '3008', 'name': 'Doe, John', 'date': '05-09-89', 'location': 'New York, NY', 'action': 'Dance', 'id': '012345'}, {'': '3080', 'name': 'Smith, John', 'date': '12-04-92', 'location': 'San Francisco, CA', 'action': 'Singing', 'id': '543210'}, etc.....

我正在尝试使用＆＃39; id＆＃39;进行迭代，将其放入下面的表格中，其中许可证编号为＆＃39;然后将其附加到另一个字典或将其写入csv。

我知道这可能很容易（也很基本），但我已经被困了两天（一天10小时）。任何帮助将不胜感激。

Answer 1

在python中获取字典中的项目非常容易。只需在字典上调用get方法并将其传递给您想要的密钥。例如：dictionary.get(key)。在您的情况下，您的key将是您的身份。＆＃39;

因为你显示了一个词典列表并且提到了迭代，所以这里有一行快速代码来从你的词典列表中提取所有id。

list_of_ids = [_dict.get("id") for _dict in list_of_dicts]

那就是它。现在你可以遍历列表并将id输入到你的表单中 - 这可能意味着你需要嵌套当前的for loop，但是你的代码并不清楚，所以我赢了说。

我希望这有帮助，如果我完全误解你的问题，我会道歉。

Python Mechanize / BeautifulSoup Scraping（迭代字典）

1 个答案: