这是我正在处理的HTML源代码:
<a href="/people/charles-adams" class="gridlist__link">
所以我要做的是使用beautifulsoup模块提取href属性,在这种情况下为“ / people / charles-adams”。我需要这样做,是因为我想使用该特定网页的soup.findAll方法获取html源代码。但我正在努力从网页中提取此类属性。有人可以帮我解决这个问题吗?
P.S。 我正在使用此方法通过Python模块beautifulSoup获取html源代码:
request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')
答案 0 :(得分:0)
尝试类似的东西:
import java.util.Scanner;
public class arrayexcersisespart3num1 {
public static void main(String []arg) {
Scanner input = new Scanner(System.in);
noDuplicates(input);
}
public static void noDuplicates(Scanner input) {
boolean check = true;
int jumbo;
int[]noDuplicates = new int [7];
System.out.println("Please enter a unique Name");
for (int i = 0; i<noDuplicates.length;) {
System.out.println("Enter a number");
jumbo = input.nextInt();
while(check ==true|| i>0) {
check = false;
System.out.println("Please enter another number");
jumbo = input.nextInt();
if (jumbo==(noDuplicates[i])) {
check = true;
System.out.println("this Name has been previously added. Please choose another number");
}
}
jumbo = noDuplicates[i];
System.out.print("this Number has been previously successfully added in position ");
System.out.println(i+1);
check = false;
i++;
}
}
}
它应该输出:
refs = soup.find_all('a')
for i in refs:
if i.has_attr('href'):
print(i['href'])
答案 1 :(得分:0)
您可以告诉beautifulsoup
用soup.find_all('a')
查找所有锚标签。然后,您可以使用列表理解功能对其进行过滤并获取链接。
request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all('a')
tags = [tag for tag in tags if tag.has_attr('href')]
links = [tag['href'] for tag in tags]
links
将是['/people/charles-adams']