我想从给定的链接中提取成员数据

时间:2019-04-19 07:51:06

标签: python beautifulsoup

我正在尝试从下面的链接中提取数据,但是我没有得到它,代码显示错误

from bs4 import BeautifulSoup
import requests
r =requests.get('http://www.smcasurat.org/Member/DirectorySearch#')
soup = BeautifulSoup(r.text,'lxml')

data = soup.find('section',class_='part_one')
name = data.find('h4')
print name.text
qual = data.find('h5')
print qual.text
contact = data.find('div',class_='media')
contact1 = contact.find('p')
print contact1.text
email = data.find('div',class_='media-body')
email1 = email.find('p')
print email1.text

ERROR-Traceback(最近一次通话最近):   在第19行的文件“ C:\ Python27 \ smcasurat.py”中     名称= data.find('h4') AttributeError:'NoneType'对象没有属性'find'

1 个答案:

答案 0 :(得分:0)

访问呈现该数据的json响应要容易得多。

import requests

url = 'http://www.smcasurat.org/Member/DirecotrySerach'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

payload = {
'type': 'ALL',
'value': '',
'pageindex': '1',
'pagesize': '9999'}

jsonData = requests.post(url, headers=headers, params=payload).json()


for member in jsonData['Data']:
    name = member['FirstName'] + ' ' + member['LastName']
    qual = member['MemberDegree'].strip()
    email = member['Email1']    
    try:
        contact = '\n'.join([v.strip() for k, v in member['Clinicinfo'][0].items() if v != ''])
    except:
        contact = '-'


    print('%s\n%s\n%s\n%s\n' %(name, qual, contact, email))

要查看输出:

for member in jsonData['Data']:
    name = member['FirstName'] + ' ' + member['LastName']
    qual = member['MemberDegree'].strip()

    try:
        contact = member['Clinicinfo'][0]['Phone1']
    except:
        contact = '-'

    email = member['Email1']

    print('%s\n%s\n%s\n%s\n' %(name, qual, contact, email))   

或者您可以使用json_normalize并将其转换为数据框

from pandas.io.json import json_normalize

df = json_normalize(jsonData['Data'])

如果要浏览文件,只需使用它并在记事本++中打开

import json
with open('C:/data.json', 'w') as outfile:
    json.dump(jsonData, outfile, indent=4)
相关问题