Python Concordance程序 - 按字母顺序排列

时间:2014-03-25 01:35:32

标签: python alphabetical-sort

我正在尝试编写一个显示文件索引的程序。它应按字母顺序输出唯一的单词及其频率。这就是我所拥有的,但它不起作用。提示?

仅供参考 - 我对计算机编程知之甚少!我正在上这门课来满足高中数学认可的要求。

f = open(raw_input("Enter a filename: "), "r")
myDict = {}
linenum = 0

for line in f:
  line = line.strip()
  line = line.lower()
  line = line.split()
  linenum += 1

for word in line:
    word = word.strip()
    word = word.lower()

    if not word in myDict:
        myDict[word] = []

    myDict[word].append(linenum)


print "%-15s %-15s" %("Word", "Line Number")
for key in sorted(myDict):
    print '%-15s: %-15d' % (key, myDict(key))

5 个答案:

答案 0 :(得分:1)

您需要使用myDict [key]来获取字典。由于这是一个列表,你需要使用sum(myDict [key])来获取频率(count)

f = "HELLO HELLO HELLO WHAT ARE YOU DOING"
myDict = {}
linenum = 0

for word in f.split():
    if not word in myDict:
        myDict[word] = []

    myDict[word].append(linenum)


print "%-15s %-15s" %("Word", "Frequency")
for key in sorted(myDict):
    print '%-15s: %-15d' % (key, len(myDict[key]))

结果:

Word            Frequency
ARE            : 1
DOING          : 1
HELLO          : 3
WHAT           : 1
YOU            : 1

答案 1 :(得分:1)

你的缩进错了。第二个循环在第一个循环之外,所以它只在最后一行工作。 (你应该考虑使用4个空格来更好地看到它)。您的打印错误,而且您正在打印行号,而不是字数。

myDict = {}
linenum = 0

for line in f:
    line = line.strip()
    line = line.lower()
    line = line.split()
    linenum += 1
    for word in line:
        word = word.strip()
        word = word.lower()

        if not word in myDict:
            myDict[word] = []
        myDict[word].append(linenum)
print "%-15s %5s  %s" %("Word", 'Count', "Line Numbers")
for key in sorted(myDict):
    print '%-15s %5d: %s' % (key, len(myDict[key]), myDict[key])

示例输出:

Word            Count  Line Numbers
-                   1: [6]
a                   4: [2, 2, 3, 7]
about               1: [6]
alphabetical        1: [4]

编辑修正了代码中的错误

答案 2 :(得分:0)

这是我的一致性解决方案......

https://github.com/jrgosalia/Python/blob/master/problem2_concordance.py

$ python --version Python 3.5.1

library.py

def getLines(fileName):
    """ getLines validates the given fileName.
        Returns all lines present in a valid file. """
    lines = ""
    if (fileName != None and len(fileName) > 0 and os.path.exists(fileName)):
        if os.path.isfile(fileName):
            file = open(fileName, 'r')
            lines = file.read()
            if (len(lines) > 0):
                return lines
            else:
                print("<" + fileName + "> is an empty file!", end="\n\n")
        else:
            print("<" + fileName + "> is not a file!", end="\n\n")
    else:
        print("<" + fileName + "> doesn't exists, try again!", end="\n\n")
    return lines

problem2_concordance.py

from library import getLines

# List of English Punctuation Symbols
# Reference : Took maximum puntuations symbols possible from https://en.wikipedia.org/wiki/Punctuation_of_English
# NOTE: Apostrophe is excluded from the list as having it or not having it will give always distinct words.
punctuations = ["[", "]", "(", ")", "{", "}", "<", ">", \
         ":", ";", ",", "`", "'", "\"", "-", ".", \
         "|", "\\", "?", "/", "!", "-", "_", "@", \
         "\#", "$", "%", "^", "&", "*", "+", "~", "=" ]

def stripPunctuation(data):
    """ Strip Punctuations from the given string. """
    for punctuation in punctuations:
        data = data.replace(punctuation, " ")
    return data

def display(wordsDictionary):
    """ Display sorted dictionary of words and their frequencies. """
    noOfWords = 0
    print("-" * 42)
    print("| %20s | %15s |" % ("WORDS".center(20), "FREQUENCY".center(15)))
    print("-" * 42)
    for word in list(sorted(wordsDictionary.keys())):
        noOfWords += 1
        print("| %-20s | %15s |" % (word, str(wordsDictionary.get(word)).center(15)))
        # Halt every 20 words (configurable)
        if (noOfWords != 0 and noOfWords % 20 == 0):
            print("\n" * 2)
            input("PRESS ENTER TO CONTINUE ... ")
            print("\n" * 5)
            print("-" * 42)
            print("| %20s | %15s |" % ("WORDS".center(20), "FREQUENCY".center(15)))
            print("-" * 42)
    print("-" * 42)
    print("\n" * 2)

def prepareDictionary(words):
    """ Prepare dictionary of words and count their occurences. """
    wordsDictionary = {}
    for word in words:
        # Handle subsequent Occurences
        if (wordsDictionary.get(word.lower(), None) != None):
            # Search and add words by checking their lowercase version
            wordsDictionary[word.lower()] = wordsDictionary.get(word.lower()) + 1
        # Handle first Occurence
        else:
            wordsDictionary[word.lower()] = 1
    return wordsDictionary

def main():
    """ Main method """
    print("\n" * 10)
    print("Given a file name, program will find unique words and their occurences!", end="\n\n");
    input("Press ENTER to start execution ... \n");

    # To store all the words and their frequencies
    wordsDictionary = {}
    lines = ""
    # Get valid input file
    while (len(lines) == 0):
        fileName = input("Enter the file name (RELATIVE ONLY and NOT ABSOLUTE): ")
        print("\n\n" * 1)
        lines = getLines(fileName)
    # Get all words by removing all puntuations
    words = stripPunctuation(lines).split()
    # Prepare the words dictionary
    wordsDictionary = prepareDictionary(words)
    # Display words dictionary
    display(wordsDictionary)

"""
    Starting point
"""
main()

注意:您也需要library.py来执行上面的代码,它也存在于同一个github存储库中。

答案 3 :(得分:0)

为什么不使用Counter?这就是它的用途:

In [8]: s = 'How many times does each word show up in this sentence word word show up up'

In [9]: words = s.split()

In [10]: Counter(words)
Out[10]: Counter({'up': 3, 'word': 3, 'show': 2, 'times': 1, 'sentence': 1, 'many': 1, 'does': 1, 'How': 1, 'each': 1, 'in': 1, 'this': 1})

注意:我不能为这个特定的解决方案而受到赞誉。它直接来自Collections Module counter Python Bootcamp

答案 4 :(得分:0)

文本文件的一致性,按字母顺序;

f=input('Enter the input file name: ')
inputFile = open(f,"r")
list={}
for word in inputFile.read().split():
    if word not in list:
        list[word] = 1
    else:
            list[word] += 1
            inputFile.close();
for i in sorted(list):
    print("{0} {1} ".format(i, list[i]));