Question

所以我认为这个标题会产生很好的搜索结果。无论如何，给出以下代码：它从text_file_reader_gen（）中获取一个yield单词作为单词，并在while循环下迭代，直到出现异常的错误（除了尝试以外有没有更好的方法除外？）并且互锁函数只是将它们混合起来。

def wordparser():
#word_freq={}
word=text_file_reader_gen()
word.next()
wordlist=[]
index=0
while True: #for word in ftext:
    try:
        #print 'entered try'
        current=next(word)
        wordlist.append(current) #Keep adding new words
        #word_freq[current]=1
        if len(wordlist)>2:
            while index < len(wordlist)-1:
                #print 'Before: len(wordlist)-1: %s || index: %s' %(len(wordlist)-1, index)
                new_word=interlock_2(wordlist[index],wordlist[index+1]) #this can be any do_something() function, irrelevant and working fine
                new_word2=interlock_2(wordlist[index+1],wordlist[index])
                print new_word,new_word2
                '''if new_word in word_freq:
                    correct_interlocked_words.append(new_word)
                if new_word2 in word_freq:
                    correct_interlocked_words.append(new_word2)'''
                index+=1
                #print 'After: len(wordlist)-1: %s || index: %s' %(len(wordlist)-1, index)
            '''if w not in word_freq:
                word_freq[w]=1
            else:
                word_freq[w]=+1'''
    except StopIteration,e:
        #print 'entered except'
        #print word_freq
        break
#return word_freq

text_file_reader_gen（）代码：

def text_file_reader_gen():
    path=str(raw_input('enter full file path \t:'))
    fin=open(path,'r')
    ftext=(x.strip() for x in fin)
    for word in ftext:
        yield word

Q1。是否可以迭代单词，同时将单词附加到字典 word_freq ，同时枚举对于word_freq中的键，其中键是单词＆amp;仍然被添加，而for循环运行和新单词使用互锁函数混合，以便大多数迭代发生在一次 - 如

while word.next() is not StopIteration: 
                word_freq[ftext.next()]+=1 if ftext not in word_freq #and
                for i,j in word_freq.keys():
                      new_word=interlock_2(j,wordlist[i+1])

我只是想要一个非常简单的东西和一个哈希字典搜索，就像真的非常快，因为它取字的txt文件是a-z很长，它也可能有重复。

Q2。如何即兴发布现有代码？ Q3。有没有办法'for i，j in enumerate（dict.items（））'以便我可以达到dict [key]＆amp; dict [next_key]同时，尽管它们是无序的，但这也无关紧要。

更新：在这里查看答案之后，这就是我提出的问题。它有效但我对以下代码有疑问：

def text_file_reader_gen():
    path=str(raw_input('enter full file path \t:'))
    fin=open(path,'r')
    ftext=(x.strip() for x in fin)
    return ftext #yield?


def wordparser():
    wordlist=[]
    index=0
    for word in text_file_reader_gen():

有效，但如果我使用 yield ftext ，则不然。

Q4。什么是基本的区别，为什么会发生这种情况？

Answer 1

据我了解您的示例代码，您只是在计算单词。将以下示例作为您可以构建的想法。

Q1。是的，不是。并行运行并非易事。您可以使用线程（GIL不允许您真正的并行性）或多处理，但我不明白为什么您需要这样做。

Q2。我不明白是否需要text_file_reader_gen()功能。生成器是迭代器，通过阅读for line in file可以实现同样的功能。

def word_parser():

    path = raw_input("enter full file path\t: ")
    words = {}
    with open(path, "r") as f:
        for line in f:
            for word in line.split():
                try:
                    words[word] += 1
                except KeyError:
                    words[word] = 1

    return words

上面逐行浏览文件，在空白处拆分每一行并计算单词。它不处理标点符号。

如果输入文件是自然语言，您可能需要查看NTLK library。这是另一个使用集合库的示例。

import collections
import string

def count_words(your_input):
    result = {}
    translate_tab = string.maketrans("","")
    with open(your_input, "r") as f:
        for line in f:
            result.update(collections.Counter(x.translate(translate_tab, string.punctuation) for x in line.split()))

    return result

 # Test.txt contains 5 paragraphs of Lorem Ipsum from some online generator
 In [61]: count_words("test.txt")
 Out[61]: 
 {'Aenean': 1,
  'Aliquam': 1,
  'Class': 1,
  'Cras': 1,
  'Cum': 1,
  'Curabitur': 2,
  'Donec': 1,
  'Duis': 1,
  'Etiam': 2,
  'Fusce': 1,
  'In': 1,
  'Integer': 1,
  'Lorem': 1,
  ......
  }

该函数逐行遍历文件，创建一个collections.Counter对象 - 基本上是dict的子类 - 用类似空格的任何内容拆分每一行，用string.translate删除标点符号最后用Counter-dict更新结果字典。计数器完成所有......计数。

Q3。不知道为什么或如何实现这一目标。

Answer 2

Q3。有没有办法'for i，j in enumerate（dict.items（））'以便我可以达到dict [key]＆amp; dict [next_key]同时

您可以获取iterable中的下一个项目。因此，您可以编写一个函数来将当前项与下一个

配对

像这样：

def with_next(thing):
    prev = next(thing)
    while True:
        try:
            cur = next(thing)
        except StopIteration, e:
            # There's no sane next item at the end of the iterable, so
            # use None.
            yield (prev, None)
            raise e
        yield (prev, cur)
        prev = cur

正如评论所说，在列表的末尾（没有“下一个键”）并不明显该做什么，所以只返回None

例如：

for curitem, nextitem in with_next(iter(['mouse', 'cat', 'dog', 'yay'])):
    print "%s (next: %s)" % (curitem, nextitem)

输出：

mouse (next: cat)
cat (next: dog)
dog (next: yay)
yay (next: None)

它适用于任何可迭代的内容（例如dict.iteritems()，dict.iterkeys()，enumerate等）：

mydict = {'mouse': 'squeek', 'cat': 'meow', 'dog': 'woof'}
for cur_key, next_key in with_next(mydict.iterkeys()):
    print "%s (next: %s)" % (cur_key, next_key)

关于您的更新：

def text_file_reader_gen():
    path=str(raw_input('enter full file path \t:'))
    fin=open(path,'r')
    ftext=(x.strip() for x in fin)
    return ftext #yield?

Q4。 [产量和回报之间]的基本区别是什么？为什么会发生这种情况？

yield和return是完全不同的事情。

return从函数返回一个值，然后函数终止。

yield将函数转换为“生成器函数”。生成器函数不是返回单个对象而是结束，而是输出一系列对象，每次调用yield一个。

以下是一些解释生成器的好页面：

return语句与许多其他编程语言一样。像official tutorial之类的东西应该解释它

Python：给定单词列表的并行快速字典搜索+列表枚举+做某事（）

2 个答案: