Question

我在文件夹中有这些字符串。假设此文件夹中还有其他类似文件。

i-j+fr(i)-fr(j)

我正在尝试提取具有字符串[3] "/farm/chickens_industrial_meat_location_df.csv" [4] "farm/goats_grassland_meat_location_df.csv"的文件，同时排除具有字符串location_df和chickens的文件。

我认为我可以通过输入以下内容来做到这一点：location_df

我的理解是，使用否定的环视将删除具有list.files(pattern = "location_df(?<!(chickens))"的字符串。我对正则表达式不了解什么，如何解决我的问题。

Answer 1

带有grepl的选项应该是

str1[!grepl('chickens_.*location_df', str1) & grepl('location_df', str1)]
#[1] "farm/goats_grassland_meat_location_df.csv"

或更简化的版本将是

str1[!grepl('chickens_', str1) & grepl('location_df', str1)]

数据

str1 <- c("/farm/chickens_industrial_meat_location_df.csv",
        "farm/goats_grassland_meat_location_df.csv" )

Answer 2

> list.files(pattern = "location_df")
[1] "chickens_industrial_meat_location_df.csv" "goats_grassland_meat_location_df.csv"    

> setdiff(list.files(pattern = "location_df"), list.files(pattern = "chickens"))
[1] "goats_grassland_meat_location_df.csv"

> setdiff(list.files(pattern = "location_df"), list.files(pattern = "goats"))
[1] "chickens_industrial_meat_location_df.csv"

根据正则表达式的R帮助文件，“ ...使用正则表达式（通常通过使用grep）的函数包括apropos，browseEnv，help.search， list.files 和ls。它们都将使用扩展的正则表达式。” （ERE）。

阅读上面的内容表明list.files()和list.dirs()函数没有实现通常与Perl兼容的正则表达式（PCRE）可用的环顾四周。一个小提示是list.files() / list.dirs()的R帮助文件不包含选项perl=TRUE。

因此，上面显示的代码使用setdiff()代替环顾四周，以帮助您查询目录。当然，使用您要搜索的两个正则表达式“令牌”上方的代码可以按任意顺序显示，但是您可以通过搜索“ location_df.csv”或“ location_df.csv $”（因为“。”来帮助自己。 csv”扩展名将出现在文件名的末尾，而$ -zerowidth断言将类似地将模式锚定到字符串的末尾）。您也可以尝试使用^将“鸡”或“山羊”锚定到字符串的开头。将所有内容放在一起可得到以下代码：

> setdiff(list.files(pattern = "location_df.csv$"), list.files(pattern = "^chickens"))
[1] "goats_grassland_meat_location_df.csv"

> setdiff(list.files(pattern = "location_df.csv$"), list.files(pattern = "^goats"))
[1] "chickens_industrial_meat_location_df.csv"

https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html
https://www.r-project.org/

在R中使用负向后看

2 个答案:

数据