从BeautifulSoup中删除NoneType

时间:2018-06-05 16:01:08

标签: python pandas beautifulsoup nonetype

我正在尝试使用以下代码从我提取的数字中删除逗号:

with requests.Session() as s:
    url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
    r = s.get(url, headers=req_headers)
    soup = BeautifulSoup(r.content, 'lxml')
    prices = []
    for price in soup.find_all('a', {"class":"listing-results-price text-price"}):
        prices.append(price.text)
        if price is None:
            print('none')
    df['price'] = prices
    df['price'] = df['price'].str.extract('(\d+([\d,]?\d)*(\.\d+)?)', expand=True) #remove extract numbers with commas
    df['price'] = df['price'].replace(',','', inplace = True)

这将返回一个列,其中所有值均为None。无论如何都要删除此NoneType错误吗?

在我运行最后一行之前,数据框如下:

         price
0          NaN
1    1,875,000
2    4,950,000
3      500,000
4      675,000
5      980,000
6      475,000
7      849,950
8    1,050,000
9    1,050,000
10     650,000
11   1,100,000
12   1,300,000
13     895,000
14   1,000,000
15  26,800,000
16   1,600,000
17     695,000
18   2,100,000
19     510,000
20   1,200,000
21   3,000,000
22     599,000
23  26,800,000
24   1,550,000
25     750,000
26   1,600,000
27   1,025,000

2 个答案:

答案 0 :(得分:2)

使用df['price'].replace(',','', inplace = True),您将替换inplace,而不会返回任何内容。

你需要:

df['price'] = df['price'].str.replace(',','')

输出:

0        NaN
1    1875000
2    4950000
3     500000
4     675000
5     980000
6     475000
7     849950
8    1050000
9    1050000

供参考,请查看docs

答案 1 :(得分:1)

我建议您在构建数据框之前在数据提取端处理它,您可以按如下方式构建列表:

from bs4 import BeautifulSoup
import requests
url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
res_lis = [int(price.text.strip().split('\n')[0].replace('£', '').replace(',', '')) for price in soup.find_all('a', {"class":"listing-results-price text-price"}) if price]
print(res_lis)

结果:

[2000000, 549950, 1050000, 500000, 675000, 980000, 475000, 849950, 1050000, 1050000, 650000, 1100000, 1300000, 895000, 1000000, 26800000, 1600000, 695000, 2100000, 510000, 3000000, 1200000, 599000, 26800000, 1550000, 750000, 1600000, 1025000]

如果您在存储之前根据需要尽可能多地构造/操作所有数据,这将是您的数据提取阶段,那么

相关问题