dplyr tidyr在列名和输入中传播错误

时间:2016-06-13 14:49:18

标签: r layout dplyr tidyr spread

我正在尝试从spread对数据集执行tidyr函数,该数据集包含目的地和原点名称,用于飞机旅程及其乘客人数。我尝试构建一个最终可用于热图的表。因此,我希望在行中使用Origin变量,并将Destination变量作为列。

我尝试使用不同的参数组合运行代码,并使用spread_,但我总是遇到错误。

如果我将spread_key_colval_col一起使用,我会得到:

  

匹配错误(x,table,nomatch = 0L):     找不到对象'DestinationRegion'

在我的大型数据集中,它会产生另一种类型的错误:

  

colnames<-中的错误(*tmp*,值= c(“ASIA SUB-CONTINENT”,“澳大利亚”,:     'dimnames'[2]的长度不等于数组范围

这是我第一次使用tidyr并且我开始了解这些包,这看起来并不太复杂。但我几个小时以来一直在研究这个问题,在任何论坛都找不到任何答案。

感谢您的帮助,

以下是数据类型的示例:

data2<-matrix(NA, nrow = 7, ncol=3)  
colnames(data2)<-c("Origin.Destination", "Total.Passengers", "Destination.Region")
data2[,1] <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
data2[,2] <- c(100, 5000, 200, 10000, 200, 20, 4000)
data2[,3] <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")

DATA2&LT; -data.frame(DATA2)

这是我的代码:

DF<- 
  data2 %>%
  spread_(key_ = "Destination.Region",
     value_ = "Total.Passengers", 
     convert = TRUE,
     drop = FALSE)

1 个答案:

答案 0 :(得分:0)

以下是一些尝试:

1)我会将data2转换为data.frame。它使得使用它变得更容易。

data2<-matrix(NA, nrow = 7, ncol=3)  
colnames(data2)<-c("Origin.Destination", "Total.Passengers", "Destination.Region")
data2[,1] <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
data2[,2] <- c(100, 5000, 200, 10000, 200, 20, 4000)
data2[,3] <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")

data3<-data.frame(data2)

2)新的data.frame需要一个明确的列(通常是索引列)才能使spread_函数正常工作。否则:

DF<- 
  data3 %>%
  spread_(key_ = "Destination.Region",
          value_ = "Total.Passengers", 
          convert = TRUE,
          drop = FALSE)

Error: Duplicate identifiers for rows (1, 6, 7)

但是如果:

data3$index<-1:nrow(data3)

DF<- 
  data3 %>%
  spread_(key_ = "Destination.Region",
          value_ = "Total.Passengers", 
          convert = TRUE,
          drop = FALSE)
DF

Origin.Destination index ASIA SOUTH AMERICA WESTERN EUROPE
1         EAST AFRICA     1   NA            NA            100
2         EAST AFRICA     2   NA            NA             NA
3         EAST AFRICA     3   NA            NA             NA
4         EAST AFRICA     4   NA         10000             NA
5         EAST AFRICA     5   NA            NA             NA
6         EAST AFRICA     6   NA            NA             20
7         EAST AFRICA     7   NA            NA           4000
8     SOUTHERN AFRICA     1   NA            NA             NA
9     SOUTHERN AFRICA     2   NA          5000             NA
10    SOUTHERN AFRICA     3   NA            NA             NA
11    SOUTHERN AFRICA     4   NA            NA             NA
12    SOUTHERN AFRICA     5  200            NA             NA
13    SOUTHERN AFRICA     6   NA            NA             NA
14    SOUTHERN AFRICA     7   NA            NA             NA
15        WEST AFRICA     1   NA            NA             NA
16        WEST AFRICA     2   NA            NA             NA
17        WEST AFRICA     3  200            NA             NA
18        WEST AFRICA     4   NA            NA             NA
19        WEST AFRICA     5   NA            NA             NA
20        WEST AFRICA     6   NA            NA             NA
21        WEST AFRICA     7   NA            NA             NA

这里可能有意义的是sum按来源和目的地划分的总乘客数。这样可以避免使用索引并防止这么多NAs:

Origin <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
Passengers <- c(100, 5000, 200, 10000, 200, 20, 4000)
Destination <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")
data3<-data.frame(Origin, Passengers, Destination)

DF<-data3 %>% group_by(Origin, Destination) %>%
  summarise(Total.Passengers = sum(Passengers)) %>%
  spread(Destination, Total.Passengers)

DF

          Origin  ASIA SOUTH AMERICA WESTERN EUROPE
          (fctr) (dbl)         (dbl)          (dbl)
1     EAST AFRICA    NA         10000           4120
2 SOUTHERN AFRICA   200          5000             NA
3     WEST AFRICA   200            NA             NA