Question

我希望看看是否有人可以帮助我完成我的小项目。我试图刮取一个xml文件的网站，但它基于标签的关键字搜索，然后输出标签中的链接。

<url>
<loc>
http://kith.com/products/nike-air-max-97-prm-yellow
</loc>
<lastmod>2017-03-04T15:05:25Z</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:loc>
https://cdn.shopify.com/s/files/1/0094/2252/products/DSC_3291_a582add6-ba97-40ad-a284-2cb1be7b31c6.jpg?v=1488575270
</image:loc>
<image:title>
Nike Air Max 1 OG Anniversary - White / University Red
</image:title>
</image:image>
</url>

这是我到目前为止测试的...它只是输出到输出文件。

import requests
from bs4 import BeautifulSoup

##put the site map xml link in here 
url = ("https://kithnyc.com/sitemap_products_1.xml")
r = requests.get(url)


soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("loc")
#images = soup.find_all("image:image")

#for url in soup.find_all('loc'):
    #print(url.text, file=open("links.txt", "a"))
for images in soup.find_all('image:title'):
    if 'Naked' in images.text: 
        print(images.text, file=open("images.txt", "a"))

根据关键字

0 个答案: