订购"混合"矢量(数字与字母)

时间:2013-12-05 09:49:24

标签: r sorting

如何订购像

这样的矢量
c("7","10a","10b","10c","8","9","11c","11b","11a","12") -> alph

alph
[1] "7","8","9","10a","10b","10c","11a","11b","11c","12"

并使用它来对data.frame进行排序,例如

V1 <- c("A","A","B","B","C","C","D","D","E","E")
V2 <- 2:1 
V3 <- alph
df <- data.frame(V1,V2,V3)

并命令行获取(订单V2,然后是V3)

 V1 V2  V3
C  1   9
A  1 10a
B  1 10c
D  1 11b
E  1  12
A  2   7
C  2   8
B  2 10b
E  2 11a
D  2 11c

1 个答案:

答案 0 :(得分:25)

> library(gtools)
> mixedsort(alph)

[1] "7"   "8"   "9"   "10a" "10b" "10c" "11a" "11b" "11c" "12" 

要对data.frame进行排序,请使用mixedorder代替

> mydf <- data.frame(alph, USArrests[seq_along(alph),])
> mydf[mixedorder(mydf$alph),]

            alph Murder Assault UrbanPop Rape
Alabama        7   13.2     236       58 21.2
California     8    9.0     276       91 40.6
Colorado       9    7.9     204       78 38.7
Alaska       10a   10.0     263       48 44.5
Arizona      10b    8.1     294       80 31.0
Arkansas     10c    8.8     190       50 19.5
Florida      11a   15.4     335       80 31.9
Delaware     11b    5.9     238       72 15.8
Connecticut  11c    3.3     110       77 11.1
Georgia       12   17.4     211       60 25.8

mixedorder关于多个向量(列)

显然mixedorder无法处理多个向量。我已经创建了一个函数,通过将所有字符向量转换为具有mixedsorted sorted level 的因子来避开这种情况,并将所有向量传递给标准order函数。

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
    do.call(order, c(
        lapply(list(...), function(l){
            if(is.character(l)){
                factor(l, levels=mixedsort(unique(l)))
            } else {
                l
            }
        }),
        list(na.last = na.last, decreasing = decreasing)
    ))
}

但是,在您的特定情况下,multi.mixedorder会获得与标准order相同的结果,因为V2是数字。

df <- data.frame(
    V1 = c("A","A","B","B","C","C","D","D","E","E"),
    V2 = 19:10,
    V3 = alph,
    stringsAsFactors = FALSE)

df[multi.mixedorder(df$V2, df$V3),]

   V1 V2  V3
10  E 10  12
9   E 11 11a
8   D 12 11b
7   D 13 11c
6   C 14   9
5   C 15   8
4   B 16 10c
3   B 17 10b
2   A 18 10a
1   A 19   7

请注意

  • 19:10相当于c(19:10)c表示 concat ,即用多个short做一个长向量,但在你的情况下你只有一个向量(19:10)所以不需要连接任何东西。但是,在V1的情况下,你有10个长度为1的向量,所以你需要连接,就像你已经做的那样。
  • 您需要stringsAsFactors=FALSE才能将V1V3转换为(错误排序的)因素(默认情况下)。