我无法在stanford nlp文本分类器中找到有关特征选择机制的任何信息。
ColumnDataClassifier默认情况下是否执行任何功能选择。以下行是来自20个新闻组数据的输出。
q1Answer="c"
def questionOne():
print("Here is a quiz to test your knowledge of computer science...")
print()
print("Question 1")
print("What type of algorithm is insertion?")
print()
print("a....searching algorithm")
print("b....decomposition ")
print("c....sorting algorithm ")
print()
def checkAnswer1(q1Answer): #q1Answer is a global variable and is needed for this function so it goes here as a parameter
attempt=0 #These are local variables
score=0
answer = input("Make your choice >>>> ")
while attempt <1:
if answer==q1Answer:
attempt= attempt+1
print("Correct!")
score =score + 2
break
elif answer != q1Answer:
answer =input("Incorrect response – 1 attempt remaining, please try again: ")
if answer ==q1Answer:
attempt = attempt + 1
print("Correct! On the second attempt")
score =score + 1
break
else:
print("That is not correct\nThe answer is "+q1Answer )
score =0
return score # This is returned so that it can be used in other parts of the program
##def questionTwo():
## print("Question 2\nWhat is abstraction\n\na....looking for problems\nb....removing irrelevant data\nc....solving the problem\n")
def main():
q1answer = questionOne()
score = checkAnswer1(q1Answer)
print ("Your final score is ", score)
main()
使用245343功能,我不认为此工具可以如此快速地使用小于2G的内存。当我尝试使用相同的数据集但更少的功能(45000)在WEKA上训练模型时,WEKA使用8G内存并且需要永远训练模型。
从文件中读取一组训练样例,并以特征形式返回数据。如果要求选择特征,则返回的特征化形式是在应用特征选择之后。
斯坦福大学NLP