从Python Webscraping Results中删除特定字符串

时间:2018-06-07 04:24:14

标签: python web-scraping

我是网络抓取的新手,我正在尝试这段代码

import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

names = soup.find_all('h2') #name of food
rest = soup.find_all('span', {'class' : 'amount'}) # price of food

for div, a in zip(names, rest):
    print(div.text, a.text) # print name / price in same line

除了我将在下面的链接中显示的一个问题之外,它的效果很好

printing result of 2 for loops in same line

除了字符串" HONEY GLAZED CHICKEN WING"是0.00美元,这是由于网站上的购物车应用程序返回的异常值(它共享跨度类='金额')。

我如何删除此字符串并且"向上移动"其他价格使他们现在排成一行并与食物的名称相对应

编辑:

下面的示例输出
 Line1: HONEY GLAZED CHICKEN WING $0.00
 Line2: CRISPY CHICKEN LUNCH BOX
 Line3:                                                    $5.00
 Line4: BREADED FISH LUNCH BOX
 Line5:                                                    $5.00

我想要的输出类似于:

 Line1: HONEY GLAZED CHICKEN WING                          $5.00
 Line2: CRISPY CHICKEN LUNCH BOX                           $5.00

我正在寻找一种解决方案,可以消除0.00的外围价格,并将其余的价格提高

3 个答案:

答案 0 :(得分:1)

我想你可能会问错了问题。您可以消除0.00美元的异常值,但您的价格结果仍然与名称不符。

为了确保您的价格和名称列表的顺序相同,以便它们匹配,可能更容易首先搜索包含它们的div:

import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

# all the divs that held the foods had this same style
divs = soup.find_all('div', {'style': 'max-height:580px;'})
names_and_prices = {
    # name: price
    div.find('h2').text: div.find('span', {'class': 'amount'}).text
    for div in divs
}
for name, price in names_and_prices.items():
    print(name, price)

答案 1 :(得分:1)

要按照上面提到的方式获得输出,您可以尝试如下:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")

for items in soup.find_all(class_='product-cat-lunch-boxes'):
    name = items.find("h2").get_text(strip=True)
    price = items.find(class_="amount").get_text(strip=True)
    print(name,price)

结果如下:

HONEY GLAZED CHICKEN WING LUNCH BOX $5.00
CRISPY CHICKEN LUNCH BOX $4.50
BREADED FISH LUNCH BOX $4.50
EGG OMELETTE LUNCH BOX $4.50
FRIED TWO-JOINT WING LUNCH BOX $4.50

答案 2 :(得分:0)

试试这个:

for div, a in zip(names, rest):
    if a.text.strip() and '$0.00' not in a.text: # empty strings are False
        print(div.text, a.text) # print name / price in same line
    else:                       # optional
         print 'Outlier'        # optional

请注意,这仅适用于a.text中包含“$ 0.00”的异常值。