用美丽的汤提取文字

时间:2014-11-06 18:17:12

标签: python beautifulsoup

我想使用bs4从这个html中提取文本,我是新手,似乎无法得到它,任何帮助都非常感激。

<div class="results">
            <span class="toggle" ng-click="display.toggleConfig()">{{display.configText}}</span>


            <p ng-hide="insecure">It would take <span ng-show="config.calculationsOriginal">a desktop PC</span> about <span class="main">{{time}}</span> to crack your password</p>
            <a class="tweet-me" ng-hide="insecure" href="http://twitter.com/home/?status=It would take a desktop PC about {{time}} to crack my password!%0d%0dhttp://hsim.pw">[Tweet Result]</a>

            <p ng-show="insecure">Your password would be cracked almost <span class="main">Instantly</span></p>
            <a class="tweet-me" ng-show="insecure" href="http://twitter.com/home/?status=My password would be cracked almost instantly!%0d%0dhttp://hsim.pw">[Tweet Result]</a>

            <span class="toggle" ng-click="display.toggleDetails()">{{display.detailsText}}</span>
        </div>

        <ul ng-show="display.details">
            <li><strong>Length:</strong> {{length}} characters</li>
            <li><strong>Character Combinations:</strong> {{characters}}</li>
            <li><strong>Calculations Per Second:</strong> {{calcsPerSecond}}</li>
            <li><strong>Possible Combinations:</strong> {{possibleCombinations}}</li>
        </ul>

        <ul ng-show="checks">
            <li ng-repeat="check in checks" class="{{check.type}}">
                <h2 ng-bind-html-unsafe="check.title"></h2>
                <p ng-bind-html-unsafe="check.wording"></p>
            </li>
        </ul>

我尝试了什么:

soup = BeautifulSoup(browser.page_source) #Example extract crack time with CSS selector
crack_time = soup.select('results') 
print crack_time[0].text 

1 个答案:

答案 0 :(得分:0)

有点不清楚html中的实际时间,但看起来它位于<span> class="main"。其中有两个可以很容易地提取出来:

for x in soup.findAll("span",{"class":"main"}):
    print x.text

给出:

{{time}}
Instantly

如果您想要对象中的所有文本,请尝试:

soup.get_text()

将以递归方式从对象及其子对象中提取所有文本。