R在分隔符之前删除字符串

时间:2017-10-10 07:34:04

标签: r gsub substr strsplit string-substitution

我有一个数据框中的列,我想在第5个分隔符之前删除部分字符串"。"和最后的"。"对于.txt,我不知道该怎么做。

输入:

jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1481-05.txt
jhu-usc.edu_BCD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1482-05.txt
jhu-usc.edu_LGG.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1483-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1484-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1485-05.txt
jhu-usc.edu_BRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1486-05.txt
jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1487-05.txt
jhu-usc.edu_PRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1488-05.txt

期望的输出:

TCGA-06-5415-01A-01D-1481-05
TCGA-06-5415-01A-01D-1482-05
TCGA-06-5415-01A-01D-1483-05
TCGA-06-5415-01A-01D-1484-05
TCGA-06-5415-01A-01D-1485-05
TCGA-06-5415-01A-01D-1486-05
TCGA-06-5415-01A-01D-1487-05
TCGA-06-5415-01A-01D-1488-05

我试过了:     sapply(strsplit(as.character(df $ V1),"。"),' [',1:5)

请指教。谢谢。

2 个答案:

答案 0 :(得分:0)

假设文本已修复

sub(".*(TCGA[^.]+).*", "\\1", str1)

答案 1 :(得分:0)

如果它们都以.txt结尾,那么你可以

sub(".+\\.([^.]+).txt", "\\1", as.character(df$V1))