从html文本链接到python 2中的txt文件

时间:2018-11-10 11:00:21

标签: python-2.7

我仅在使用python 2编写脚本时需要帮助,该脚本将显示以下页面的标题:https://lite.cnn.com/en,并将其逐行保存在文本文件中,如下所示:

"Trump, Macron gloss over differences in France after rough start 
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...

请留下您的任何建议。谢谢 。

2 个答案:

答案 0 :(得分:0)

您可以使用 beautifulSoup 来完成工作

from bs4 import BeautifulSoup

import requests

url = "https://lite.cnn.com/en"
r  = requests.get(url)

data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
    file.write(link.text + "\n")
file.close()

testfile.txt

Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China 
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global 
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why

答案 1 :(得分:0)

有一些简单的方法可以读取HTML,但是它可以读取页面的源代码:

import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
    file = open('testfile.txt','a')
    file.write(line)
    file.close()
相关问题