适用于单词组

时间:2015-07-11 13:45:44

标签: r distance

我有一张表:

df<-data.frame(palabra=c('ani', 'anib', 'alop', 'alope','ber', 'beren'))

我需要为单词组创建一个距离矩阵,按照第一个字符分组。

为此我添加:

df$letra<-substring(df$palabra,1,1)

现在我需要为每个组应用adist功能。举一个adist的例子:

adist(df$palabra, costs=list(insertions=1, deletions=1, substitutions=2))

如何为每个组创建一个距离表?

1 个答案:

答案 0 :(得分:3)

lapplysplit的简单组合会让您想要:

#split is used to create two data frames; one for group a and one
#for groupb b
#lapply will apply the adist function to each of the groups
lapply(split(df, df$letra), function(x) {
  adist(x$palabra, costs=list(insertions=1, deletions=1, substitutions=2))
})

输出:

$a
     [,1] [,2] [,3] [,4]
[1,]    0    1    5    6
[2,]    1    0    6    7
[3,]    5    6    0    1
[4,]    6    7    1    0

$b
     [,1] [,2]
[1,]    0    2
[2,]    2    0