R中的多路频率表

时间:2015-11-11 18:10:25

标签: r

我的data.frame包含以下字段:

State    County    Race
FL       Broward   Black
FL       Broward   White
GA       DeKalb    White
GA       Fulton    Hispanic

等等。我需要的是在唯一race组合中对每个State - County(因为它是自己的变量)的计数,我想保持0并且还得到总数。所以对于上面的例子,我想得到:

State    County   White    Black    Hispanic    Total
FL       Broward    1        1        0           2
GA       DeKalb     1        0        0           1
GA       Fulton     0        0        1           1

我可以使用state-county包裹{plyr}获得总计:

count(deaths,c("State","County"))

但是当我添加额外的竞赛层时,我将每个竞赛都放在自己的行上,而不是列。输出如下:

State     County      Race      Freq
TX         Bee       Unknown     1
TX         Bee       White       1
TX         Bell      Black       1
TX         Bell      Unknown     3
TX         Bell      White       3

如何以我需要的格式获取此内容?

3 个答案:

答案 0 :(得分:4)

使用" data.table"你可以尝试:

library(data.table)
dcast(as.data.table(mydf)[, count := .N, by = names(mydf)], 
      State + County ~ Race, fun = c, value.var = "count", fill = 0)[
        , Total := rowSums(.SD), by = .(State, County)][]
#    State  County Black Hispanic White Total
# 1:    FL Broward     1        0     1     2
# 2:    GA  DeKalb     0        0     1     1
# 3:    GA  Fulton     0        1     0     1

我似乎无法通过不先创建"计数"来保存任何详细程度。柱。以下是我试图直接在dcast中处理它的内容:

dcast(as.data.table(mydf), State + County ~ Race, 
      fun.aggregate = function(x) as.numeric(!is.na(x)), fill = 0)[
        , Total := rowSums(.SD), by = .(State, County)][]

答案 1 :(得分:2)

我们可以使用dplyr中的count,然后使用spread数据来扩大数据:

library(dplyr)
library(tidyr)

dat %>% count(State, County, Race) %>%
        spread(Race, n, fill = 0) %>%
        mutate(total = rowSums(.[sapply(., is.numeric)]))

Source: local data frame [3 x 6]

   State  County Black Hispanic White total
  (fctr)  (fctr) (dbl)    (dbl) (dbl) (dbl)
1     FL Broward     1        0     1     2
2     GA  DeKalb     0        0     1     1
3     GA  Fulton     0        1     0     1

答案 2 :(得分:2)

dt = read.table(text="State    County    Race
                FL       Broward   Black
                FL       Broward   White
                GA       DeKalb    White
                GA       Fulton    Hispanic", header=T)

library(dplyr)
library(tidyr)

dt %>%
  group_by(State,County) %>%
  mutate(Total = n()) %>%
  count(State,County,Race,Total) %>%
  ungroup() %>%
  spread(Race,n, fill=0) %>%
  select(-matches("Total"), Total)

#     State  County Black Hispanic White Total
#    (fctr)  (fctr) (dbl)    (dbl) (dbl) (int)
# 1     FL Broward     1        0     1     2
# 2     GA  DeKalb     0        0     1     1
# 3     GA  Fulton     0        1     0     1