如何从输出文本中删除HTML标签?

时间:2018-12-21 16:19:44

标签: python html

很抱歉,如果以前已经问过这个问题,但是我尝试过的所有解决方案似乎都没有效果。

我创建了一个程序,用户在其中输入单词,该程序从Dictionary.com网站上提取了该单词的示例。

我想删除始终围绕关键字的HTML标签。我将如何去做?

import requests

word = input("Enter a word: ")

webContent = requests.get('https://www.dictionary.com/browse/'+word)

from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')

results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})

firstResult = results[0]
print(firstResult.contents[0:3])

结果: Result

2 个答案:

答案 0 :(得分:1)

import requests
import re

word = input("Enter a word: ")

webContent = requests.get('https://www.dictionary.com/browse/'+word)

from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')

results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})

firstResult = results[0]
firstResult.contents=[re.sub('<[^<]+?>', '', str(x)) for x in firstResult.contents]
print(firstResult.contents[0:3])

结果:

enter image description here

答案 1 :(得分:0)

尝试:您只需要使用.getText()函数

ul {
    ... // like before
    width: 100%
}