转置data.frame将第一行变成一个列表

时间:2020-05-21 13:03:07

标签: r dataframe matrix transpose

我的数据如下:

library(data.table)
DF <- structure(list(toberevised = c("Number of returns", "Number of joint returns", 
"Number with paid preparer's signature"), `SOUTH DAKOTA_All returns` = c(135257620, 
52607676, 80455243), `SOUTH DAKOTA_Under_50000` = c(92150166, 
20743943, 53622647)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

我希望将第一列作为变量,并将列中的变量作为变量,所以我这样做了:

DF<- as.data.frame(t(DF))
setnames(DF, DF[1,])

但是我得到了错误:

Passed a vector of type 'list'. Needs to be type 'character'

我已经尝试过一切我想取消的列表,但无济于事。

我在做什么错了?

1 个答案:

答案 0 :(得分:0)

转置data.frame很危险,因为t()返回一个矩阵,其中所有元素(“单元”)都被强制转换为相同的数据类型:

t(DF)
                         [,1]                [,2]                      [,3]                                   
toberevised              "Number of returns" "Number of joint returns" "Number with paid preparer's signature"
SOUTH DAKOTA_All returns "135257620"         " 52607676"               " 80455243"                            
SOUTH DAKOTA_Under_50000 "92150166"          "20743943"                "53622647"

现在,所有数字值都已被强制键入可能不想要的字符。

正如之前{​​{3}}和here多次提到的,我建议将数据重塑为整齐的格式,即长格式,以简化数据处理:

library(data.table)
long <- melt(DF, id.vars = "toberevised")
long
                             toberevised                 variable     value
1:                     Number of returns SOUTH DAKOTA_All returns 135257620
2:               Number of joint returns SOUTH DAKOTA_All returns  52607676
3: Number with paid preparer's signature SOUTH DAKOTA_All returns  80455243
4:                     Number of returns SOUTH DAKOTA_Under_50000  92150166
5:               Number of joint returns SOUTH DAKOTA_Under_50000  20743943
6: Number with paid preparer's signature SOUTH DAKOTA_Under_50000  53622647

从长格式开始,我们可以重塑为所需的宽格式:

dcast(long, variable ~ toberevised) 
                   variable Number of joint returns Number of returns Number with paid preparer's signature
1: SOUTH DAKOTA_All returns                52607676         135257620                              80455243
2: SOUTH DAKOTA_Under_50000                20743943          92150166                              53622647

现在,数字仍然是数字类型。


根据经验,每当将列名视为属性时,例如SOUTH DAKOTA_Under_50000,数据就可能不是整齐的格式。属性应存储并视为数据项,以便将其用于子集,分组和聚合。

实际上,SOUTH DAKOTA_Under_50000包含两个属性,一个区域和一个分类。

相关问题