stanford-nlp - 斯坦福文本分类器特征选择

我无法在stanford nlp文本分类器中找到有关特征选择机制的任何信息。

ColumnDataClassifier默认情况下是否执行任何功能选择。以下行是来自20个新闻组数据的输出。

q1Answer="c" def questionOne(): print("Here is a quiz to test your knowledge of computer science...") print() print("Question 1") print("What type of algorithm is insertion?") print() print("a....searching algorithm") print("b....decomposition ") print("c....sorting algorithm ") print() def checkAnswer1(q1Answer): #q1Answer is a global variable and is needed for this function so it goes here as a parameter attempt=0 #These are local variables score=0 answer = input("Make your choice >>>> ") while attempt <1: if answer==q1Answer: attempt= attempt+1 print("Correct!") score =score + 2 break elif answer != q1Answer: answer =input("Incorrect response – 1 attempt remaining, please try again: ") if answer ==q1Answer: attempt = attempt + 1 print("Correct! On the second attempt") score =score + 1 break else: print("That is not correct\nThe answer is "+q1Answer ) score =0 return score # This is returned so that it can be used in other parts of the program ##def questionTwo(): ## print("Question 2\nWhat is abstraction\n\na....looking for problems\nb....removing irrelevant data\nc....solving the problem\n") def main(): q1answer = questionOne() score = checkAnswer1(q1Answer) print ("Your final score is ", score) main()

使用245343功能，我不认为此工具可以如此快速地使用小于2G的内存。当我尝试使用相同的数据集但更少的功能（45000）在WEKA上训练模型时，WEKA使用8G内存并且需要永远训练模型。

从文件中读取一组训练样例，并以特征形式返回数据。如果要求选择特征，则返回的特征化形式是在应用特征选择之后。

斯坦福大学NLP

斯坦福文本分类器特征选择

0 个答案: