斯坦福文本分类器特征选择

时间:2016-11-08 21:19:51

标签: stanford-nlp text-classification

我无法在stanford nlp文本分类器中找到有关特征选择机制的任何信息。

ColumnDataClassifier默认情况下是否执行任何功能选择。以下行是来自20个新闻组数据的输出。

q1Answer="c" def questionOne(): print("Here is a quiz to test your knowledge of computer science...") print() print("Question 1") print("What type of algorithm is insertion?") print() print("a....searching algorithm") print("b....decomposition ") print("c....sorting algorithm ") print() def checkAnswer1(q1Answer): #q1Answer is a global variable and is needed for this function so it goes here as a parameter attempt=0 #These are local variables score=0 answer = input("Make your choice >>>> ") while attempt <1: if answer==q1Answer: attempt= attempt+1 print("Correct!") score =score + 2 break elif answer != q1Answer: answer =input("Incorrect response – 1 attempt remaining, please try again: ") if answer ==q1Answer: attempt = attempt + 1 print("Correct! On the second attempt") score =score + 1 break else: print("That is not correct\nThe answer is "+q1Answer ) score =0 return score # This is returned so that it can be used in other parts of the program ##def questionTwo(): ## print("Question 2\nWhat is abstraction\n\na....looking for problems\nb....removing irrelevant data\nc....solving the problem\n") def main(): q1answer = questionOne() score = checkAnswer1(q1Answer) print ("Your final score is ", score) main()

使用245343功能,我不认为此工具可以如此快速地使用小于2G的内存。当我尝试使用相同的数据集但更少的功能(45000)在WEKA上训练模型时,WEKA使用8G内存并且需要永远训练模型。

从文件中读取一组训练样例,并以特征形式返回数据。如果要求选择特征,则返回的特征化形式是在应用特征选择之后。

斯坦福大学NLP

0 个答案:

没有答案