判断=是否在R代码中赋值的可靠方法?

时间:2012-06-30 22:24:22

标签: r

我是一个顽固的使用者,他一直使用=而不是<-,显然很多R程序员都会对此表示不满。我编写了formatR包,可以根据parser包将=替换为<-。正如你们中的一些人可能知道的那样,parser几天前在CRAN上成了孤儿。虽然现在又回来了,但这让我对依赖它犹豫不决。我想知道是否有另一种方法可以安全地将=替换为<-,因为并非所有=都是平均分配,例如fun(a = 1)。正则表达式不太可靠(请参阅mask.inline()formatR函数的line 18),但如果您能改进我的话,我一定会感激不尽。也许codetools包可以提供帮助吗?

一些测试用例:

# should replace
a = matrix(1, 1)
a = matrix(
  1, 1)

(a = 1)
a =
  1

function() {
  a = 1
}

# should not replace
c(
  a = 1
  )

c(
  a = c(
  1, 2))

3 个答案:

答案 0 :(得分:4)

这个答案使用正则表达式。有一些边缘情况会失败,但大多数代码都应该没问题。如果你需要完美的匹配,那么你需要使用一个解析器,但如果遇到问题,可以随时调整正则表达式。

提防

#quoted function names
`my cr*azily*named^function!`(x = 1:10)
#Nested brackets inside functions
mean(x = (3 + 1:10))
#assignments inside if or for blocks
if((x = 10) > 3) cat("foo")
#functions running over multiple lines will currently fail
#maybe fixable with paste(original_code, collapse = "\n")
mean(
  x = 1:10
)

代码基于?regmatches页面上的示例。基本思路是:交换占位符的函数内容,进行替换,然后将函数内容放回去。

#Sample code.  For real case, use 
#readLines("source_file.R")
original_code <- c("a = 1", "b = mean(x = 1)")

#Function contents are considered to be a function name, 
#an open bracket, some stuff, then a close bracket.
#Here function names are considered to be a letter or
#dot or underscore followed by optional letters, numbers, dots or 
#underscores.  This matches a few non-valid names (see ?match.names
#and warning above).
function_content <- gregexpr(
  "[[:alpha:]._][[:alnum:._]*\\([^)]*\\)", 
  original_code
)

#Take a copy of the code to modify
copy <- original_code

#Replace all instances of function contents with the word PLACEHOLDER.
#If you have that word inside your code already, things will break.
copy <- mapply(
  function(pattern, replacement, x) 
  {
    if(length(pattern) > 0) 
    {
      gsub(pattern, replacement, x, fixed = TRUE) 
    } else x
  }, 
  pattern = regmatches(copy, function_content), 
  replacement = "PLACEHOLDER", 
  x = copy,
  USE.NAMES = FALSE
)

#Replace = with <-
copy <- gsub("=", "<-", copy)

#Now substitute back your function contents
(fixed_code <- mapply(
  function(pattern, replacement, x) 
  {
      if(length(replacement) > 0) 
      {
          gsub(pattern, replacement, x, fixed = TRUE) 
      } else x
  }, 
  pattern = "PLACEHOLDER", 
  replacement = regmatches(original_code, function_content), 
  x = copy,
  USE.NAMES = FALSE
))

#Write back to your source file
#writeLines(fixed_code, "source_file_fixed.R")

答案 1 :(得分:4)

Kohske向formatR个软件包发送了pull request,该软件包使用codetools软件包解决了问题。基本思想是设置代码遍历器来遍历代码;当它将=检测为函数调用的符号时,它将被<-替换。这是由于R的“Lisp性质”:x = 1实际上是`=`(x, 1)(我们将其替换为`<-`(x, 1));当然,在=的解析树中对fun(x = 1)的处理方式不同。

formatR包(&gt; = 0.5.2)已经摆脱了对parser包的依赖,replace.assign现在应该是健壮的。

答案 2 :(得分:-3)

=替换<-的最安全(也可能是最快)方法是直接键入<-,而不是尝试替换它。