é€åˆ—排åºæ•°æ®ï¼Œåœ¨ç»„内添加索引

时间:2016-10-21 00:29:37

标签: r sorting dataframe

This question很好地æ述了我的问题的设置。

然而,我有一个å为algorithmçš„å› å­ï¼Œè€Œä¸æ˜¯ç¬¬äºŒä¸ªå€¼ã€‚我的数æ®æ¡†å¦‚下所示(注æ„å³ä½¿åœ¨å…¶ç»„内也存在多个值的å¯èƒ½æ€§ï¼‰ï¼š

algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
    algorithm  v
1      global  5
2 distributed  2
3 distributed  6
4        none  7
5      global  3
6      global  1
7 distributed 10
8        none  2
9        none  2

我想按v对数æ®å¸§è¿›è¡ŒæŽ’åºï¼Œä½†æ˜¯èŽ·å–与其组(算法)相关的æ¯ä¸ªæ¡ç›®çš„排åºä½ç½®ã€‚然åŽåº”将此ä½ç½®æ·»åŠ åˆ°åŽŸå§‹æ•°æ®æ¡†ä¸­ï¼ˆå› æ­¤æˆ‘ä¸éœ€è¦é‡æ–°æŽ’列它),因为我想使用ggplot将计算的ä½ç½®ç»˜åˆ¶ä¸ºx,将值绘制为y(按算法分组,例如æ¯ä¸ªç®—法是一组点。)

所以结果应该是这样的:

    algorithm  v  groupIndex
1      global  5  3
2 distributed  2  1
3 distributed  6  2
4        none  7  3
5      global  3  2
6      global  1  1
7 distributed 10  3
8        none  2  1
9        none  2  2

到目å‰ä¸ºæ­¢ï¼Œæˆ‘知é“我å¯ä»¥å…ˆé€šè¿‡ç®—法对数æ®è¿›è¡ŒæŽ’åºï¼Œç„¶åŽæŒ‰å€¼æˆ–å过æ¥è¿›è¡ŒæŽ’åºã€‚我想在第二步中我必须计算æ¯ç»„内的指数?有没有一ç§ç®€å•çš„方法å¯ä»¥åšåˆ°è¿™ä¸€ç‚¹ï¼Ÿ

df[order(df$algorithm, df$v), ]
    algorithm  v
2 distributed  2
3 distributed  6
7 distributed 10
6      global  1
5      global  3
1      global  5
8        none  2
9        none  2
4        none  7

修改:无法ä¿è¯æ¯ç»„çš„å‚赛作å“æ•°é‡ç›¸åŒï¼

2 个答案:

答案 0 :(得分:3)

æ¯ç»„中orderçš„åŒé‡åº”用应涵盖它:

ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2

这相当于:

ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2

,这å过æ¥æ„味ç€å¦‚果您担心速度,å¯ä»¥åˆ©ç”¨frank中的data.table:

setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
#     algorithm  v grpidx
#1:      global  5      3
#2: distributed  2      1
#3: distributed  6      2
#4:        none  7      3
#5:      global  3      2
#6:      global  1      1
#7: distributed 10      3
#8:        none  2      1
#9:        none  2      2

答案 1 :(得分:2)

一ç§æ–¹å¼å¦‚下。我认为,您å¯ä»¥ä½¿ç”¨v为æ¯ä¸ªç»„订购with_order()个值。您å¯ä»¥åœ¨å‡½æ•°ä¸­ä½¿ç”¨row_number()指定排å。通过这ç§æ–¹å¼ï¼Œæ‚¨å¯ä»¥è·³è¿‡åœ¨å°è¯•ä½¿ç”¨order()时为æ¯ä¸ªç»„排列数æ®çš„步骤。

library(dplyr)
group_by(df, algorithm) %>%
mutate(groupInd = with_order(order_by = v, fun = row_number, x = v))

#    algorithm     v groupInd
#       <fctr> <int>    <int>
#1      global     5        3
#2 distributed     2        1
#3 distributed     6        2
#4        none     7        3
#5      global     3        2
#6      global     1        1
#7 distributed    10        3
#8        none     2        1
#9        none     2        2