二叉搜索树频率计数器

时间:2018-11-09 16:30:43

标签: python python-3.x search tree binary

我需要阅读一个文本文件,去除不必要的标点符号,将单词小写,并使用二进制搜索树功能来制作由文件中的单词组成的单词二进制搜索树。

要求我们计算重复出现的单词的频率,并要求总单词数和总唯一单词数。

到目前为止,我已经解决了标点符号,完成了文件读取,完成了小写字母,基本完成了二进制搜索树的工作,我只需要弄清楚如何在代码中实现“频率”计数器即可。

我的代码如下:

class BSearchTree :
class _Node :
    def __init__(self, word, left = None, right = None) :
        self._word = word
        self._count = 0
        self._left = left
        self._right = right

def __init__(self) :
    self._root = None
    self._wordc = 0
    self._each = 0

def isEmpty(self) :
    return self._root == None


def search(self, word) :
    probe = self._root
    while (probe != None) :
        if word == probe._word :
            return probe
        if word < probe._value :
            probe = probe._left
        else : 
            probe = probe._right
    return None     

def insert(self, word) :
    if self.isEmpty() :
        self._root = self._Node(word)
        self._root._freq += 1 <- is this correct?
        return

    parent = None               #to keep track of parent
                                #we need above information to adjust 
                                #link of parent of new node later

    probe = self._root
    while (probe != None) :
        if word < probe._word :     # go to left tree
            parent = probe          # before we go to child, save parent
            probe = probe._left
        elif word > probe._word :   # go to right tree
            parent = probe          # before we go to child, save parent
            probe = probe._right

    if (word < parent._word) :      #new value will be new left child
        parent._left = self._Node(word)
    else :    #new value will be new right child
        parent._right = self._Node(word)

原因是格式化使我丧命,这是它的后半部分。

class NotPresent(Exception) :
pass

def main():
t=BST()

file = open("sample.txt")           
line = file.readline()                      
file.close()                            


#for word in line:
#   t.insert(word)
# Line above crashes program because there are too many 
# words to add. Lines on bottom tests BST class
t.insert('all')
t.insert('high')
t.insert('fly')
t.insert('can')
t.insert('boars')
#t.insert('all') <- how do i handle duplicates by making 
t.inOrder()        #extras add to the nodes frequency?

感谢您的帮助/尝试提供帮助!

1 个答案:

答案 0 :(得分:0)

首先,将Node的{​​{1}}初始化为1优于在_freq的{​​{1}}中进行初始化

(另外1个:在python编码约定中,不建议在写入默认参数值时使用空格。)

BST

,然后添加最后3行:

insert()