从列R中删除重复项

时间:2017-07-25 09:25:03

标签: r unique substr

我有一个列有不同长度的ID的列,其中一些ID有版本号。

rownames(x)

"ENSP00000424360.1-D4"
"ENSP00000424360.2-D4"
"ENSP00000424360.3-D4"
"ENSP00000437781-D59"
"XP_010974537.1"
"XP_010974538.1"
"XP_010974538.2"

我希望将这些改为:

"ENSP00000424360"
"ENSP00000424360.1"
"ENSP00000424360.2"
"ENSP00000437781"
"XP_010974537"
"XP_010974538"
"XP_010974538.1"

我可以使用

单独转换ENSxxXPxx
make.unique(substr(rownames(x),1,15))

make.unique(substr(rownames(dds),1,12)) 

如何更改代码以获得所需的结果?

1 个答案:

答案 0 :(得分:2)

我们使用sub删除子字符串并应用make.unique

make.unique(sub("-.*$", "", sub("\\..*", "", rownames(x))))
#[1] "ENSP00000424360"   "ENSP00000424360.1" "ENSP00000424360.2"
#[4] "ENSP00000437781"   "XP_010974537"      "XP_010974538"      "XP_010974538.1"   

数据

x <- structure(list(v1 = 1:7), .Names = "v1", row.names = c("ENSP00000424360.1-D4", 
 "ENSP00000424360.2-D4", "ENSP00000424360.3-D4", "ENSP00000437781-D59", 
 "XP_010974537.1", "XP_010974538.1", "XP_010974538.2"), class = "data.frame")