Question

我是R的新手，正在探索Text Mining。使用以下步骤我可以通过直到阻止，但是，我需要做POS标记并获得文本/主题模式。我正在使用的数据是客户逐字记录。请帮助如何进一步。我检查的大多数文章没有解释如何对Corpus中的数据进行POS标记，我找不到有关模式检测的任何细节。任何帮助将不胜感激...！提前谢谢，

CSVfile = read.csv("Testfortextcsv.csv",stringsAsFactors = FALSE)
TestSplit = as.data.frame(sent_detect_nlp(CSVfile$Comment))
colnames(TestSplit)[colnames(TestSplit)=="sent_detect_nlp(CSVfile$Comment)"]<- "Comment"
TestCorpus = Corpus(VectorSource(TestSplit$Comment))
TestCorpus = tm_map(TestCorpus, tolower)
TestCorpus = tm_map(TestCorpus, PlainTextDocument)
TestCorpus = tm_map(TestCorpus, removePunctuation)
TestCorpus = tm_map(TestCorpus, removeWords,c("Test",stopwords("SMART"),stopwords("english")))
TestCorpus = tm_map(TestCorpus, stripWhitespace)
TestCorpus = tm_map(TestCorpus, stemDocument)
dtm <- TermDocumentMatrix(TestCorpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

这是我用来获得wordcloud，关联和Barplot。

WordCloud
----------
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,max.words=200,random.order=FALSE, rot.per=0.35, colors=brewer.pal(8,
"Dark2"))

Find Frequent Terms
-----------------
findFreqTerms(dtm, lowfreq = 15)

Find Association:
-----------------------
findAssocs(dtm, terms = "account", corlimit = 0.3)

Bar Plot for frequencies
--------------------------
barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,col ="lightblue", main ="Most frequent words",ylab = "Word frequencies")

Answer 1

qdap包允许您识别字符串中每个单词的词性。：

library(qdap)
s1<-c("Hello World")  
pos(s1)

您可能会找到其他资源openNLP和RTextTools以及another possibility

POS标签＆amp; R

1 个答案: