在scrapy中连接刮削数据

时间:2013-08-28 07:23:30

标签: python scrapy

如何连接数据,以便我的代码就像US-NJ-Camden:

def parse_items(self, response):
     hxs = HtmlXPathSelector(response)
     loc_Con = hxs.select('/tr/td[2]/span/span/span') #for country
     loc_Reg = hxs.select('/tr/td[2]/span/span') #for region
     loc_Loc = hxs.select('//tr[3]/td[2]/span/span') #for local
     items = []
     for titles in titles:
     item = somethingItem()
     item ["country"] = loc_Con.select('text()').extract()
     item ["region"] = loc_Reg.select('text()').extract()
     item ["location"] = loc_Loc.select('text()').extract()
     item ["code"] = #["country"]-item ["region"]-item ["location"] like the above example

1 个答案:

答案 0 :(得分:0)

以这种方式使用format()

item["code"] = "{}-{}-{}".format(item["country"], item["region"], item["location"])

或者这样:

item["code"] = "{country}-{region}-{location}".format(country=item["country"], 
                                                      region=item["region"],
                                                      location=item["location"])

或旧式格式:

item["code"] = "%s-%s-%s" % (item["country"], item["region"], item["location"])

UPD:

并且,不要忘记从提取的列表中获取第0项:

item ["country"] = loc_Con.select('text()').extract()
item ["region"] = loc_Reg.select('text()').extract()
item ["location"] = loc_Loc.select('text()').extract()

item ["country"] = item ["country"][0] if item ["country"] else ""
item ["region"] = item ["region"][0] if item ["region"] else ""
item ["location"] = item ["location"][0] if item ["location"] else ""