Question

我编写了一个将原子符号转换为原子序数的函数......

AtomicNo  <- function(x) {
  y  <- NULL
  for (i in seq(along=x)) {
    if (x[i] == "H") y[i]  <- 1.0 else
      if (x[i] == "C") y[i]  <- 6.0 else
        if (x[i] == "O") y[i]  <- 8.0 else
          if (x[i] == "Fe") y[i]  <- 26.0 else
            if (x[i] == "Br") y[i]  <- 35.0
    y  <- append(y,y[i])
  }
    return(y)
  }

对于矢量

a <- c("Fe", "Br", "O", "O", "C", "H", "H", "H", "C", "H", "H", "H", 
        "C", "H", "H", "H", "C", "H", "H", "H")

AtomicNo（a）给出

26 35  8  8  6  1  1  1  6  1  1  1  6  1  1  1  6  1  1  1  1

也就是说，在向量的末尾有一个额外的1，应该只有三个而不是四个。

谁能看到我出错的地方？

Answer 1

您可以简单地执行以下操作：

，而不是使用多个ifelse

elements <- c("H", "He", "Li", "Be", "B", "C", "N", "O", "F", "Ne", "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar", "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn", "Ga", "Ge", "As", "Se", "Br", "Kr", "Rb", "Sr", "Y", "Zr", "Nb", "Mo", "Tc", "Ru", "Rh", "Pd", "Ag", "Cd", "In", "Sn", "Sb", "Te", "I", "Xe", "Cs", "Ba", "La", "Ce", "Pr", "Nd", "Pm", "Sm", "Eu", "Gd", "Tb", "Dy", "Ho", "Er", "Tm", "Yb", "Lu", "Hf", "Ta", "W", "Re", "Os", "Ir", "Pt", "Au", "Hg", "Tl", "Pb", "Bi", "Po", "At", "Rn", "Fr", "Ra", "Ac", "Th", "Pa", "U", "Np", "Pu", "Am", "Cm", "Bk", "Cf", "Es", "Fm", "Md", "No", "Lr", "Rf", "Db", "Sg", "Bh", "Hs", "Mt", "Ds", "Rg", "Cn", "Uut", "Fl", "Uup", "Lv", "Uus", "Uuo")

（无论如何，化学家都可以获得元素列表）

然后：

> match(a,elements)
 [1] 26 35  8  8  6  1  1  1  6  1  1  1  6  1  1  1  6  1  1  1

这是基准：

> microbenchmark(f.match(big.a), atomic.recode(big.a), atomic.ifelse(big.a))
Unit: microseconds
                 expr       min        lq       mean    median        uq       max neval cld
       f.match(big.a)   205.090   252.345   280.8174   279.556   305.683   384.358   100 a  
 atomic.recode(big.a)  7689.944  8123.826  8622.3087  8295.475  8583.322 14963.013   100  b 
 atomic.ifelse(big.a) 21804.622 23092.946 24446.9123 24041.193 25475.073 29158.469   100   c

（f.match <- function(x) match(x,elements)）

Answer 2

您可能会发现使用汽车套件中的recode功能更容易（或至少更少打字）：

library(car)
recode(a, "'H'=1;'C'=6;'O'=8;'Fe'=26;'Br'=35;")
# [1] 26 35  8  8  6  1  1  1  6  1  1  1  6  1  1  1  6  1  1  1

如果你想留在基数R，那么你会发现ifelse函数在语法非常相似的情况下效率更高：

atomic.ifelse <- function(x) {
  ifelse(x == "H", 1,
    ifelse(x == "C", 6,
      ifelse(x == "O", 8,
        ifelse(x == "Fe", 26,
          ifelse(x == "Br", 35, NA)))))
}

recode，ifelse和match应比使用for语句的if循环更有效（添加atomic.if来自@CactusWoman，来自@MaratTalipov的atomic.match和来自@Dason的atomic.index：

big.a <- rep(a, 1000)
all.equal(atomic.if(big.a), atomic.recode(big.a), atomic.ifelse(big.a), atomic.match(big.a), atomic.index(big.a))
# [1] TRUE
library(microbenchmark)
microbenchmark(atomic.if(big.a), atomic.recode(big.a), atomic.ifelse(big.a), atomic.match(big.a), atomic.index(big.a))
# Unit: microseconds
#                  expr        min          lq        mean      median         uq         max neval
#      atomic.if(big.a) 753887.018 823974.2900 887305.3812 876902.6380 924005.505 1836067.802   100
#  atomic.recode(big.a)   8748.951   9129.5230  10694.0044   9299.0145   9617.688  116548.870   100
#  atomic.ifelse(big.a)  26329.875  27568.6540  30005.9327  28635.7760  29652.327  133560.908   100
#   atomic.match(big.a)    210.846    257.7595    370.9925    296.4305    343.732    2434.733   100
#   atomic.index(big.a)    527.043    616.7620   1013.0317    876.6060   1077.634    3371.246   100

recode，ifelse，match和矢量索引的矢量化提供了超过for循环的30-3000x的加速比，即使在这个相对较小的矢量上（长度） 20,000）。 match和向量索引似乎是效率方面的赢家（比recode快15-30倍），因此这将是非常大的向量的方法。

Answer 3

对于简单的重新编码，您可以使用向量索引以及给出元素名称

code <- c("H" = 1.0, "C" = 6.0, "O" = 8.0, "Fe" = 26.0, "Br" = 35.0)
a <- c("Fe", "Br", "O", "O", "C", "H", "H", "H", "C", "H", "H", "H", "C", "H", "H", "H", "C", "H", "H", "H") 
code[a]
#Fe Br  O  O  C  H  H  H  C  H  H  H  C  H  H  H  C  H  H  H 
#26 35  8  8  6  1  1  1  6  1  1  1  6  1  1  1  6  1  1  1 
## If you don't want the names...
unname(code[a])
# [1] 26 35  8  8  6  1  1  1  6  1  1  1  6  1  1  1  6  1  1  1

编辑：

至于为什么你最后得到额外的1 - 它与你的代码有关。看看解开的前几个迭代

> y <- NULL
> y[1] <- 26
> y <- append(y, y[1])
> y
[1] 26 26
> y[2] <- 35
> y <- append(y, y[2])
> y
[1] 26 35 35

基本上你直接设置最后一个元素，然后决定将它追加到最后。在下一次迭代中，附加元素将被覆盖，但是在最后一次迭代中，没有任何东西可以覆盖末尾的附加元素，所以最终会得到最后一个重复的值。

Answer 4

您无需在功能结束时追加y。只需删除它就可以了，

AtomicNo  <- function(x) {
  y  <- NULL
  for (i in seq(along=x)) {
    if (x[i] == "H") y[i]  <- 1.0 else
      if (x[i] == "C") y[i]  <- 6.0 else
        if (x[i] == "O") y[i]  <- 8.0 else
          if (x[i] == "Fe") y[i]  <- 26.0 else
            if (x[i] == "Br") y[i]  <- 35.0
  }
  return(y)
}

对于Loop One来说太长了

4 个答案: