我正在尝试使用python中的google api提取google搜索结果。我能够提取url,链接,标题和代码段。但我也想提取显示在Google搜索结果中的评分。 下面是我正在使用的代码:
$.ajax({
url: 'http://example/test/profileForm.php',
data: form,
processData: false,
contentType: false,
type: 'POST',
success: function (data) {
$("#loadingIMG").hide();
$(imgEdit).attr('src', data);
}
});
在Google上搜索“ swiggy company review”时,我看到的第一个搜索结果显示为3.7级,但我不知道如何提取该信息。有人可以提出任何解决方案吗? 预先感谢
答案 0 :(得分:0)
由于 Google API 已被弃用,因此可以使用 BeautifulSoup
CCS
选择器 select()
(针对多个元素)/select_one()
(针对特定元素)轻松完成抓取其他技术中的方法。
代码和full example:
from bs4 import BeautifulSoup
import requests, lxml, json
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(
'https://www.google.com/search?q=swiggy company review',
headers=headers).text
soup = BeautifulSoup(response, 'lxml')
# Selects just one Review element (using converted xPath to CSS selector):
# review = soup.select_one('#rso > div:nth-of-type(1) > div > div > div:nth-of-type(2) > div > span:nth-of-type(1)').text
# print(review)
# Selects just one Vote element (using converted xPath to CSS selector):
# votes = soup.select_one('#rso > div:nth-of-type(1) > div > div > div:nth-of-type(2) > div > span:nth-of-type(2)').text
# print(votes)
data = []
# Selects multiple Vote elements:
for something in soup.select('.uo4vr'):
rating = something.select_one('.uo4vr g-review-stars+ span').text.split(':')[1].strip()
votes_reviews = something.select_one('.uo4vr span+ span').text.split(' ')[0]
data.append({
"Rating": rating,
"Votes/Reviews": votes_reviews,
})
print(json.dumps(data, indent=2))
输出:
[
{
"Rating": "4",
"Votes/Reviews": "1,219"
},
{
"Rating": "4",
"Votes/Reviews": "1,090"
},
{
"Rating": "3.8",
"Votes/Reviews": "46"
},
{
"Rating": "3.8",
"Votes/Reviews": "260"
},
{
"Rating": "4.1",
"Votes/Reviews": "1,047"
},
{
"Rating": "3.3",
"Votes/Reviews": "47"
},
{
"Rating": "1.5",
"Votes/Reviews": "114"
}
]
或者,您可以使用来自 SerpApi 的 Google Organic Results API。这是一个免费试用的付费 API。
要集成的代码:
from serpapi import GoogleSearch
import os, json
params = {
"engine": "google",
"q": "swiggy company review",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
# For extracting single elements:
# rating = results['organic_results'][0]['rich_snippet']['top']['detected_extensions']['rating']
# print(f"Rating: {rating}")
# votes = results['organic_results'][0]['rich_snippet']['top']['detected_extensions']['votes']
# print(f"Votes: {votes}")
# For extracing multiple elements:
data = []
for organic_result in results['organic_results']:
title = organic_result['title']
try:
rating = organic_result['rich_snippet']['top']['detected_extensions']['rating']
except:
rating = None
try:
votes = organic_result['rich_snippet']['top']['detected_extensions']['votes']
except:
votes = None
try:
reviews = organic_result['rich_snippet']['top']['detected_extensions']['reviews']
except:
reviews = None
data.append({
"Title": title,
"Rating": rating,
"Votes": votes,
"Reviews": reviews,
})
print(json.dumps(data, indent=2))
输出:
[
{
"Title": "Swiggy Reviews | Glassdoor",
"Rating": 4,
"Votes": 1219,
"Reviews": null
},
{
"Title": "Ride.Swiggy: 254 Employee Reviews | Indeed.com",
"Rating": null,
"Votes": null,
"Reviews": null
}
{
"Title": "Working at Swiggy | Glassdoor",
"Rating": 4,
"Votes": 1090,
"Reviews": null
}
]
<块引用>
免责声明,我为 SerpApi 工作。