Question

我有一些大型网络分析数据看起来像这样（＆＃34;友谊＆＃34; =友谊分数学生给予改变）：

studentid  alterid   friendship
 30401001 30401002  1.308245549
 30401001 30401003 -0.671986154
 30401001 30401004  0.039891905
 30401002 30401001  0.669867969
 30401002 30401003  0.967399033
 30401002 30401004 -0.902678435
 30401003 30401001  0.002150519
 30401003 30401002 -0.272702372
 30401003 30401004 -0.441293873
 30401004 30401001 -0.902678435
 30401004 30401002 -0.902678435
 30401004 30401003 -0.902678435

我想创建一个＆＃34; alter_friendship＆＃34;变量是变形给学生的友谊分数。结果应如下所示：

 studentid  alterid  friendship  alter_friendship   
 30401001 30401002  1.308245549  0.669867969
 30401001 30401003 -0.671986154  0.002150519
 30401001 30401004  0.039891905 -0.902678435
 30401002 30401001  0.669867969  1.308245549 
 30401002 30401003  0.967399033 -0.272702372
 30401002 30401004 -0.902678435 -0.902678435
 30401003 30401001  0.002150519  0.039891905
 30401003 30401002 -0.272702372  0.967399033
 30401003 30401004 -0.441293873 -0.902678435
 30401004 30401001 -0.902678435  0.039891905
 30401004 30401002 -0.902678435 -0.902678435
 30401004 30401003 -0.902678435 -0.441293873

我尝试将匹配与adply结合使用：

net$alter_friendship<-adply(.margins=1,net$friendship[match(net$alterid,net$studentid)])

这确实只为1号学生（30401001）提供了正确的答案，但其余部分的答案都是错误的。

如果有人有更好的想法，那就太棒了。

Answer 1

merge(d,d,by.x=c('studentid','alterid'),by.y=c('alterid','studentid'))

将产生：

   studentid  alterid friendship.x friendship.y
1   30401001 30401002  1.308245549  0.669867969
2   30401001 30401003 -0.671986154  0.002150519
3   30401001 30401004  0.039891905 -0.902678435
4   30401002 30401001  0.669867969  1.308245549
5   30401002 30401003  0.967399033 -0.272702372
6   30401002 30401004 -0.902678435 -0.902678435
7   30401003 30401001  0.002150519 -0.671986154
8   30401003 30401002 -0.272702372  0.967399033
9   30401003 30401004 -0.441293873 -0.902678435
10  30401004 30401001 -0.902678435  0.039891905
11  30401004 30401002 -0.902678435 -0.902678435
12  30401004 30401003 -0.902678435 -0.441293873

其中d是您的输入数据集：

d <- structure(list(studentid = c(30401001L, 30401001L, 30401001L, 
30401002L, 30401002L, 30401002L, 30401003L, 30401003L, 30401003L, 
30401004L, 30401004L, 30401004L), alterid = c(30401002L, 30401003L, 
30401004L, 30401001L, 30401003L, 30401004L, 30401001L, 30401002L, 
30401004L, 30401001L, 30401002L, 30401003L), friendship = c(1.308245549, 
-0.671986154, 0.039891905, 0.669867969, 0.967399033, -0.902678435, 
0.002150519, -0.272702372, -0.441293873, -0.902678435, -0.902678435, 
-0.902678435)), .Names = c("studentid", "alterid", "friendship"
), class = "data.frame", row.names = c(NA, -12L))

Answer 2

您可以使用sapply执行此操作，例如：

df$alter_friendship <- sapply(seq_len(nrow(df)), function(i) {
  with(df, friendship[studentid == alterid[i] & alterid == studentid[i]])
})

结果：

df
#   studentid  alterid   friendship alter_friendship
#1   30401001 30401002  1.308245549      0.669867969
#2   30401001 30401003 -0.671986154      0.002150519
#3   30401001 30401004  0.039891905     -0.902678435
#4   30401002 30401001  0.669867969      1.308245549
#5   30401002 30401003  0.967399033     -0.272702372
#6   30401002 30401004 -0.902678435     -0.902678435
#7   30401003 30401001  0.002150519     -0.671986154
#8   30401003 30401002 -0.272702372      0.967399033
#9   30401003 30401004 -0.441293873     -0.902678435
#10  30401004 30401001 -0.902678435      0.039891905
#11  30401004 30401002 -0.902678435     -0.902678435
#12  30401004 30401003 -0.902678435     -0.441293873

Answer 3

dplyr 可以通过自我加入（使用Marat的数据）来实现这一目标：

library(dplyr)
inner_join(d, d, by = c("studentid" = "alterid", "alterid" = "studentid"))

但为什么你问题中的代码会失败？代码是（为了清楚起见，将net更改为d）：

adply(.margins=1, d$friendship[match(d$alterid, d$studentid)])

R将第二个（未命名的）参数解释为.data参数。因此，除了对行进行编号之外，adply实际上并没有做任何事情，因为它没有被赋予执行的函数，并且默认函数是NULL。

然后，所有代码都使用match的结果来索引数据框。当存在多个匹配项时，match会返回第一个匹配的行。因此意外的结果。

> cbind(d[, -3], match = match(d$alterid, d$studentid))
   studentid  alterid match
1   30401001 30401002     4
2   30401001 30401003     7
3   30401001 30401004    10
4   30401002 30401001     1
5   30401002 30401003     7
6   30401002 30401004    10
7   30401003 30401001     1
8   30401003 30401002     4
9   30401003 30401004    10
10  30401004 30401001     1
11  30401004 30401002     4
12  30401004 30401003     7

我怀疑你打算adply遍历每一行，找到studentid == alterid & alterid == studentid条件的完全匹配，如下所示：

cbind(d, V1 = adply(d, 1, function(x) {
  d[d$alterid == x$studentid & d$studentid == x$alterid, "friendship"]
  })$V1)

与其他答案相比，这是非常低效的。

匹配和非唯一ID的匹配

3 个答案: