Question

我有一组文件名，如：

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

我想根据“ - ”之后的数字过滤它们。

例如，在python中，我可以使用排序函数的key参数：

filelist <- ["filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt"]
sorted(filelist, key=lambda(x): int(x.split("-")[1].split(".")[0]))

> ["filec-1.txt", "fileb-2.txt", "filef-4.txt", "filed-5.txt", "filea-10.txt"]

在R中，到目前为止，我正在玩strsplit和lapply而没有运气。

在R中执行此操作的方法是什么？

修改：文件名可以是很多东西，可能包含更多数字。唯一固定的模式是我想要排序的数字是在“ - ”之后。另一个（真实的）例子：

c <- ("boards10017-51.mp4",  "boards10065-66.mp4",  "boards10071-81.mp4",
      "boards10185-91.mp4", "boards10212-63.mp4",  "boards1025-51.mp4",   
      "boards1026-71.mp4",   "boards10309-89.mp4", "boards10310-68.mp4",  
      "boards10384-50.mp4",  "boards10398-77.mp4",  "boards10419-119.mp4", 
      "boards10421-85.mp4",  "boards10444-87.mp4",  "boards10451-60.mp4",  
      "boards10461-81.mp4",  "boards10463-52.mp4",  "boards10538-83.mp4",  
      "boards10575-62.mp4",  "boards10577-249.mp4")"

Answer 1

我不确定文件名列表的实际复杂程度，但以下内容可能就足够了：

filelist[order(as.numeric(gsub("[^0-9]+", "", filelist)))]
# [1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

考虑到您的修改，您可能希望将gsub更改为：

gsub(".*-|\\..*", "", filelist)

同样，如果没有更多文字案例，很难说这是否足以满足您的需求。

示例：

 x <- c("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4", 
     "boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",     
     "boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",     
     "boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",   
     "boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",    
     "boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",    
     "boards10575-62.mp4", "boards10577-249.mp4")  

x[order(as.numeric(gsub(".*-|\\..*", "", x)))]
##  [1] "boards10384-50.mp4"  "boards10017-51.mp4"  "boards1025-51.mp4"  
##  [4] "boards10463-52.mp4"  "boards10451-60.mp4"  "boards10575-62.mp4" 
##  [7] "boards10212-63.mp4"  "boards10065-66.mp4"  "boards10310-68.mp4" 
## [10] "boards1026-71.mp4"   "boards10398-77.mp4"  "boards10071-81.mp4" 
## [13] "boards10461-81.mp4"  "boards10538-83.mp4"  "boards10421-85.mp4" 
## [16] "boards10444-87.mp4"  "boards10309-89.mp4"  "boards10185-91.mp4" 
## [19] "boards10419-119.mp4" "boards10577-249.mp4"

Answer 2

我做了一个regEx排序功能：

<强>功能

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

数据：

reg_sort(filelist,"\\d+") #[1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"

通话功能

reg_sort(filelist,-"\\d+")

其他功能包括：

降序排序：#[1] "filea-10.txt" "filed-5.txt" "filef-4.txt" "fileb-2.txt" "filec-1.txt"

reg_sort(filelist,-"\\d+","\\w")

多层排序：reg_sort(filelist,"\\d+",verbose=T)（对此示例数据没有意义）

详细模式：$\\d+（请参阅/检查regEx模式提取的内容以便排序）

[1] 1 2 4 5 10 [1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"

$matches[0]

R根据子串排序字符串

2 个答案: