根据另一列的部分添加因子列

时间:2015-11-13 16:18:45

标签: regex r

我有一些看起来像这样的数据:

SS <- structure(list(rn = 
c("Exp.618.1.7..ABC.TRE854.HS.2...1.Saline...1...A.", 
"Exp.618.1.7..ABC.TRE854.HS.2...4.Res..Reference...1...A.", "Exp.618.1.7..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..100nM...1...A.", 
"Exp.618.1.7..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..1.00uM...1...A.", 
"Exp.618.1.7..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..10.0uM...1...A.", 
"Exp.618.2.5..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.618.2.5..ABC.TRE854.HS.2...4.Res..Reference...1...A.", 
"Exp.618.2.5..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..300nM...1...A.", 
"Exp.618.2.5..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..3.0uM...1...A.", 
"Exp.618.2.5..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..30uM...1...A.", 
"Exp.622.1.2..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.622.1.2..ABC.TRE854.HS.2...4.Res..Reference...1...A.", 
"Exp.622.1.2..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..100nM...1...A.", 
"Exp.622.1.2..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..1.00uM...1...A.", 
"Exp.622.1.2..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..10.0uM...1...A.", 
"Exp.622.2.5..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.622.2.5..ABC.TRE854.HS.2...4.Res..Reference...1...A.", 
"Exp.622.2.5..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..300nM...1...A.", 
"Exp.622.2.5..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..3.0uM...1...A.", 
"Exp.622.2.5..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..30uM...1...A."
), V1 = c(6.08174172247795, -273.068131175906, -38.0098754654436, 
-44.1874819464636, -126.058280657819, 28.7111941404515, -326.124708404277, 
-61.0348906065704, -63.7440680070101, -62.8961106505329, 18.9484530926351, 
-607.977222113268, -212.18247673418, -179.193611578799, -230.372071747453, 
11.6278896202125, -258.129269330527, -26.634614887808, -29.8940173506221, 
-63.2992704853608), Exp = c("Exp.618.1.", "Exp.618.1.", "Exp.618.1.", 
"Exp.618.1.", "Exp.618.1.", "Exp.618.2.", "Exp.618.2.", "Exp.618.2.", 
"Exp.618.2.", "Exp.618.2.", "Exp.622.1.", "Exp.622.1.", "Exp.622.1.", 
"Exp.622.1.", "Exp.622.1.", "Exp.622.2.", "Exp.622.2.", "Exp.622.2.", 
"Exp.622.2.", "Exp.622.2."), Value_norm = c(-0.0222718839298028, 
1, 0.139195574751849, 0.16181852402981, 0.461636735546466, -0.0880374697180561, 
1, 0.187151997483457, 0.195459179768711, 0.192859078228946, -0.0311663865083172, 
1, 0.348997411443565, 0.294737376765432, 0.3789156293499, -0.0450467692035472, 
1, 0.103183242089851, 0.115810258279326, 0.245223142069596)), .Names = c("rn", 
"V1", "Exp", "Value_norm"), row.names = c(NA, 20L), class = "data.frame")

在rn列中有一些名称,我需要用它来创建一个因子,所以我可以在GGplot2中绘图。这些名字是:

Saline
Reference
100nM
300nM
1uM
3uM
10uM
30uM

我希望final的数据看起来像示例,但最后有一个因子列,上面有一个标签。

我为只有一张我的数据图片道歉,但我希望它的格式很好,我不能在这里的对话框中做到这一点!

提前致谢!

1 个答案:

答案 0 :(得分:0)

嗯,如果你完全匹配列中的术语会更容易。如果没关系,你可以做到

rx <- "\\b(Saline|Reference|100nM|300nM|1.00uM|3.0uM|10.0uM|30uM)\\b"
SS$type <- regmatches(SS$rn,regexpr(rx, SS$rn))

这应该给出

的等级
c("1.00uM", "10.0uM", "100nM", "3.0uM", "300nM", "30uM", "Reference", "Saline")

如果你想重命名那些不同的,你可以做

remap <- c("1.00uM"="1uM", "3.0uM"="3uM", "10.0uM"="10uM")
SS$type[SS$type %in% names(remap)] <- remap[SS$type[SS$type %in% names(remap)]]
相关问题