我有一个像
这样的数据框Task Response
1 NA
2 NA
3 EFFICACY
4 I was sent to external vendor for solution (PDA parts), but at PDA parts they identified within few minites that new battery would not solve the issue. I wonder why this diagnosis part could no have been done at the locla IS service in the Amgen office. Now I spent time to visit PDA parts at their place, while this finally did not bring any solution.
5 Issue could not be resolved
其中2列是任务和响应。 并且响应具有一定的NA值。
现在我正在寻找为每条记录创建POS标记并仅提取NOUNS
5条记录创建的POS标记应该与 -
相同Task POSTagged
1 NA/NNP
2 NA/NNP
3 EFFICACY/NNP
4 vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN
5 Issue/NN
所以它应该是2列和5条记录的矩阵
我正在尝试使用该功能
tagPOS = function(x) {
s <- as.String(x)
sent_token_annotator = Maxent_Sent_Token_Annotator()
word_token_annotator = Maxent_Word_Token_Annotator()
a2 = annotate(s, list(sent_token_annotator, word_token_annotator))
pos_tag_annotator = Maxent_POS_Tag_Annotator()
a3 = annotate(s, pos_tag_annotator, a2)
a3w = subset(a3, type == "word")
POStags = unlist(lapply(a3w$features, `[[`, "POS"))
gc()
return(paste(POStags,collapse = " "))
}
我尝试了lapply,并通过遍历记录但是所有人都为每条记录提供了所有5条记录的组合POStag。
即。对于每条记录,我将POS标记为
NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
我得到的是
Task Response
1 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
2 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
3 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
4 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
5 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
这不是我想要的。 代码尝试了
lapply(df2$Task, tagPOS (df2$Response), data = df2)
resultset <- group_by(df2, Task) %>% do(tagPOS (df2$Response))
df2[,c("Keywords"):= tagPOS(strip(df2$Response)),by = Task]
Responsedf<-lapply(Response, extractPOS, "NN")
df2$noun <- with(df2, extractPOS(df2$Response, "NN"))
但到目前为止没有任何效果 希望我有道理。
任何建议都将不胜感激
答案 0 :(得分:0)
找到解决方案 -
for (i in 0:nrow(df2)) {
df2$noun[i]<-lapply(df2$short_description[i], extractPOS, "NN")
gc()
}
感谢。