从R

时间:2018-11-27 05:34:50

标签: r dataframe fread rowname

我试图从通过fread()生成的colnames中删除名字。第一列名称仅充当行名称的标题。在工作流的稍后部分,此“标题”确实将我的数据弄乱了,因为它被视为行之一,所以以某种方式,我需要将其忽略或不存在。

我的DGE_file的子集如下:

            GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik                  1                  0
2: 0610009E02Rik                  0                  0

我试图这样删除第一列名称:

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)

colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)

可以理解会产生错误:

> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) : 
  Can't assign 10000 names to a 10001 column data.table

我已经尝试用NA代替它,但是它在下游处理中产生了一个我无法解决的错误。

如何在下游处理中删除标题“基因”或使其不可见?

2 个答案:

答案 0 :(得分:1)

以下应该可以工作

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
# Set the first column name to the empty string.
names(DGE_file)[1] <- ""

答案 1 :(得分:0)

您可以读取没有标题和第一行的文件,然后设置列名。但是,以我个人的观点,使用没有名称的列名或使用NA作为名称可能会出现问题。

require(magrittr) # for piping
require(data.table) #For reading with fread

# Read in the dge file
#Without header and skiping the first line
DGE_file <- fread(file="DGE.txt",
                  skip = 1,
                  header=FALSE,
                  stringsAsFactors = TRUE)

#Set the column names (for "invisible" name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c("", "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

OR

#Set the column names (for NA as the first name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c(NA, "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

用于添加名称的base R解决方案如下:

#Read the file with header 
DGE_file <- fread(file="DGE.txt",
                  header=TRUE,
                  stringsAsFactors = TRUE)

#Set an "inivisible" as a name
names(DGE_file)[1] <- ""

#Or set an NA as a name
names(DGE_file)[1] <- NA