USER TITLE NAME
aaa 1 Alex
aaa 2 Alex
aaa 3 Alex
aaa 4 Alex
aaa 5 Alex
bbb 6 Alex
bbb 7 Alex
ddd 8 Alex
aaa 1 Bob
aaa 2 Bob
bbb 3 Bob
bbb 4 Bob
bbb 5 Bob
ddd 6 Bob
ddd 7 Bob
ddd 8 Bob
ddd 9 Bob
然后我希望结果看起来像这样:
NAME USER (most frequent) USER (2nd frequent)
Alex aaa bbb
Bob ddd bbb
也许是深夜,但我不能参考我之前编写的任何可以用这种方式对数据进行排序的代码。
我是否需要对NAME中出现的每个值进行子集化,然后只需将表()设置为USER?
如果有帮助,USER是静态的,列中只有5或6个“用户”(aaa到eee,比如说)。作为奖励,也许我可以拥有第三列,第二频率用户?非常感谢帮助。谢谢!
答案 0 :(得分:2)
使用data.table
的一种方法是按小组(' USER' USER' NAME')获取nrows(.N
),order
' N',选择前两行(.SD[1:2]
),通过' NAME',按&#39创建序列变量(' ind') ; NAME'来自' long'以及dcast
广泛的'格式。
library(data.table)
dcast(setDT(df1)[,list(N=.N), .(USER, NAME)][order(-N),.SD[1:2] ,
NAME][, ind:= paste0('USER', 1:.N), NAME], NAME~ind, value.var='USER')
# NAME USER1 USER2
#1: Alex aaa bbb
#2: Bob ddd bbb
注意:我们可以将列名更改为' USER FREQUENT1' USER FREQUENT2' USER FREQUENT2'通过更改paste0
或在输出中使用setnames
来等。
或者使用base R
,我们可以获得数据集的table
' NAME',' USER'列,order
数据集(& #39; d1'),为分组变量' NAME',subset
创建一个序列列(' ind'),然后使用reshape
更改' long'的格式广泛的'。
d1 <- as.data.frame(table(df1[c(3,1)]))
d2 <- d1[with(d1, order(NAME, -Freq)),]
d2$ind <- with(d2, ave(Freq, NAME, FUN=seq_along))
reshape(subset(d2, ind <3, -Freq), idvar='NAME',
timevar='ind', direction='wide')
# NAME USER.1 USER.2
#1 Alex aaa bbb
#6 Bob ddd bbb
df1 <- structure(list(USER = c("aaa", "aaa", "aaa", "aaa", "aaa",
"bbb",
"bbb", "ddd", "aaa", "aaa", "bbb", "bbb", "bbb", "ddd", "ddd",
"ddd", "ddd"), TITLE = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), NAME = c("Alex", "Alex", "Alex",
"Alex", "Alex", "Alex", "Alex", "Alex", "Bob", "Bob", "Bob",
"Bob", "Bob", "Bob", "Bob", "Bob", "Bob")), .Names = c("USER",
"TITLE", "NAME"), class = "data.frame", row.names = c(NA, -17L))