从文本中提取名词和动词

时间:2010-06-04 00:52:07

标签: r

我想知道是否有可能在R包openNLP中单独提取名词,动词? 我使用标记句子的tagPOS函数,但是如果我想分别提取动词,名词,该怎么做。

1 个答案:

答案 0 :(得分:9)

使用示例:(这是提取标记为/ VBx的单词,其中x是任何单个字符)

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."

acqTag <- tagPOS(acq)

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))

     [,1]                           
[1,] "said"                         
[2,] "sold"                         
[3,] "engaged"                      
[4,] "said"                         
[5,] "is"                           
[6,] "did"                          
[7,] " not/RB explain./NN Reuter./."

好的,我的正则表达式需要一些改进才能摆脱结果中的最后一行。

修改

另一种方法是忽略包含space字符的行

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )