单独浏览列表

时间:2015-12-08 04:45:51

标签: python arrays list iteration

除了一些Ruby之外,我的编码背景非常有限,所以如果有更好的方法,请告诉我!

基本上我有一个充满单词的.txt文件。我想导入.txt文件并将其转换为列表。然后,我想获取列表中的第一项,将其分配给变量,并在发送的外部请求中使用该变量来获取单词的定义。返回定义,并将其隐藏到另一个.txt文件中。一旦完成,我希望代码抓住列表中的下一个项目并再次完成所有操作,直到列表用完为止。

以下是我正在进行的代码,以便了解我所处的位置。我还在试图弄清楚如何正确地遍历列表,并且我很难解释文档。

如果已经提出要求,请提前抱歉!我搜索过,但找不到任何具体回答我问题的内容。

from __future__ import print_function
import requests
import urllib2, urllib
from bs4 import BeautifulSoup

lines = []

with open('words.txt') as f:
    lines = f.readlines()

for each in lines


wordlist = open('test.txt', 'a')

word = ##figure out how to get items from list and assign them here

url = 'http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query=%s' % word

# print url and make sure it's correct

html = urllib.urlopen(url).read()
# print html (deprecated)
soup = BeautifulSoup(html)
visible_text = soup.find('pre')(text=True)[0]

print(visible_text, file=wordlist)

2 个答案:

答案 0 :(得分:1)

将所有内容保持在循环中。像那样:

with open('test.txt', 'a') as wordlist:
    for word in lines:
        url = 'http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query=%s' % word
        print url
        # print url and make sure it's correct
        html = urllib.urlopen(url).read()
        soup = BeautifulSoup(html)
        visible_text = soup.find('pre')(text=True)[0]
        wordlist.write("{0}\n".format(visible_text))

其次,一些建议:

  1. f.readlines()不会丢弃尾随\n。所以,我会使用f.read().splitlines()

    lines = f.read().splitlines()
    
  2. 您不要使用lines初始化[ ]列表,因为您只需一次构建列表并将其分配给lines。只有在考虑将append()用于列表时,才需要初始化列表。因此,不需要以下行。

    lines = []
    
  3. 您可以通过以下方式处理KeyError

    try:
        value = soup.find('pre', text=True)[0]
        return value
    except KeyError:
        return None   
    

答案 1 :(得分:0)

我还展示了如何使用Python requests库来检索原始html页面。这使我们可以轻松检查状态代码是否成功检索。如果您愿意,可以将相关的行替换为urllib。

您可以使用pip:requests

在命令行中安装pip install requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-


import sys
import re
import requests
import urllib2, urllib
from bs4 import BeautifulSoup


def get_html_with_urllib(word):
    url = "http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query={word}".format(word=word)
    html = urllib.urlopen(url).read()
    return html


def get_html(word):
    url = "http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query={word}".format(word=word)
    response = requests.get(url)

    # Something bad happened
    if response.status_code != 200:
        return ""

    # Did not get back html
    if not response.headers["Content-Type"].startswith("text/html"):
        return ""

    html = response.content
    return html


def format_definitions(raw_definitions_text):
    # Get individual lines in definitions text
    parts = raw_definitions_text.split('\n')

    # Convert to str
    # Remove extra spaces on the left.
    # Add one space at the end for later joining with next line
    parts = map(lambda x: str(x).lstrip() + ' ', parts)

    result = []
    current = ""
    for p in parts:
        if re.search("\w*[0-9]+:", p):
            # Start of new line. Contains some char followed by <number>:

            # Save previous lines
            result.append(current.replace('\n', ' '))

            # Set start of current line
            current = p
        else:
            # Continue line
            current += p

    result.append(current)
    return '\n'.join(result)


def get_definitions(word):
    # Uncomment this to use requests
    # html = get_html(word)
    # Could not get definition
    # if not html:
        # return None

    html = get_html_with_urllib(word)

    soup = BeautifulSoup(html, "html.parser")
    # Get block containing definition
    definitions = soup.find("pre").get_text()

    definitions = format_definitions(definitions)
    return definitions


def batch_query(input_filepath):
    with open(input_filepath) as infile:
        for word in infile:
            word = word.strip()  # Remove spaces from both ends
            definitions = get_definitions(word)
            if not definitions:
                print("Could not retrieve definitions for {word}".format(word=word))

            print("Definition for {word} is: ".format(word=word))
            print(definitions)


def main():
    input_filepath = sys.argv[1]  # Alternatively, change this to file containing words
    batch_query(input_filepath)


if __name__ == "__main__":
    main()

输出:

Definition for cat is: 
cat 
n 1: feline mammal usually having thick soft fur and being unable to roar; domestic cats; wildcats [syn: true cat] 
2: an informal term for a youth or man; "a nice guy"; "the guy's only doing it for some doll" [syn: guy, hombre, bozo] 
3: a spiteful woman gossip; "what a cat she is!" 
4: the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant; "in Yemen kat is used daily by 85% of adults" [syn: kat, khat, qat, quat, Arabian tea, African tea] 
5: a whip with nine knotted cords; "British sailors feared the cat" [syn: cat-o'-nine-tails] 
6: a large vehicle that is driven by caterpillar tracks; frequently used for moving earth in construction and farm work [syn: Caterpillar] 
7: any of several large cats typically able to roar and living in the wild [syn: big cat] 
8: a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis [syn: computerized tomography, computed tomography, CT, computerized axial tomography, computed axial tomography] 
v 1: beat with a cat-o'-nine-tails 
2: eject the contents of the stomach through the mouth; "After drinking too much, the students vomited"; "He purged continuously"; "The patient regurgitated the food we gave him last night" [syn: vomit, vomit up, purge, cast, sick, be sick, disgorge, regorge, retch, puke, barf, spew, spue, chuck, upchuck, honk, regurgitate, throw up] [ant: keep down] [also: catting, catted]  
Definition for dog is: 
dog 
n 1: a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; "the dog barked all night" [syn: domestic dog, Canis familiaris] 
2: a dull unattractive unpleasant girl or woman; "she got a reputation as a frump"; "she's a real dog" [syn: frump] 
3: informal term for a man; "you lucky dog" 
4: someone who is morally reprehensible; "you dirty dog" [syn: cad, bounder, blackguard, hound, heel] 
5: a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll [syn: frank, frankfurter, hotdog, hot dog, wiener, wienerwurst, weenie] 
6: a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward [syn: pawl, detent, click] 
7: metal supports for logs in a fireplace; "the andirons were too hot to touch" [syn: andiron, firedog, dog-iron] v : go after with the intent to catch; "The policeman chased the mugger down the alley"; "the dog chased the rabbit" [syn: chase, chase after, trail, tail, tag, give chase, go after, track] [also: dogging, dogged]  
Definition for car is: 
car 
n 1: 4-wheeled motor vehicle; usually propelled by an internal combustion engine; "he needs a car to get to work" [syn: auto, automobile, machine, motorcar] 
2: a wheeled vehicle adapted to the rails of railroad; "three cars had jumped the rails" [syn: railcar, railway car, railroad car] 
3: a conveyance for passengers or freight on a cable railway; "they took a cable car to the top of the mountain" [syn: cable car] 
4: car suspended from an airship and carrying personnel and cargo and power plant [syn: gondola] 
5: where passengers ride up and down; "the car was on the top floor" [syn: elevator car]