是否有一种简单的方法来解决sign()函数?

时间:2018-07-14 15:15:03

标签: r

我编写了一个函数,该函数使用sign()函数查找给定向量中哪些数字为正数或负数。我想知道是否有一种简单的方法可以在不使用sign()函数的情况下获取字符向量(例如+和-)。

1 个答案:

答案 0 :(得分:1)

“围绕sign()函数工作”的“硬性”是什么?

这里有几个选项,大多数看上去都很简单,但是您可以使用任何您喜欢的选项。

cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-"))
factor(sign(x), levels = c(-1, 1), labels = c("-", "+"))
ifelse(x < 0, -1, 1)
ifelse(sign(x) == -1, "+", "-")
c("+", "-")[(x < 0) + 1L]
sub("1", "+", sub("-1", "-", sign(x))) # from comments

您可能要确保输入0的行为是您想要/期望的。


现在优化可能对此没有多大意义,因为很难想象这是一个代码瓶颈,即使是较慢的方法也可以很快完成,但是出于一般教育的目的,我们可以比较一下方法:

n = 1000
x = runif(n, min = -1, max = 1)

print(microbenchmark::microbenchmark(
    cut = cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-")),
    factor = factor(sign(x), levels = c(-1, 1), labels = c("-", "+")),
    ifelse_direct = ifelse(x < 0, -1, 1),
    ifelse_sign = ifelse(sign(x) == -1, "+", "-"),
    vector_index = c("+", "-")[(x < 0) + 1L],
    double_sub = sub("1", "+", sub("-1", "-", sign(x))),
    times = 10
), order = "mean")

# Unit: microseconds
#           expr      min       lq      mean    median       uq      max neval  cld
#   vector_index   13.650   14.542   14.9753   15.1135   15.600   16.202    10 a   
#  ifelse_direct   62.070   64.065   83.4343   64.7030   68.473  170.470    10 a   
#    ifelse_sign  193.101  197.737  225.5119  203.9010  209.966  354.551    10  b  
#            cut  189.734  190.560  244.9517  207.7210  240.709  472.329    10  b  
#         factor  514.649  516.468  571.2281  541.8715  553.215  899.395    10   c 
#     double_sub 1295.653 1309.340 1376.3982 1381.7635 1420.775 1502.250    10    d

向量索引方法可能是可读性最差的方法,但是我将其包括在内是因为我猜想它将是最高效的,大约是原来的5倍。毫不奇怪,其余的似乎从简单变成了复杂。这是不完全公平的,因为输出是不同的类-如果我们将所有内容都强制为factor,则ifelse_direct方法会变慢,但是直接索引方法仍然最快,现在大约是7倍。 / p>

print(microbenchmark::microbenchmark(
    cut = cut(x, breaks = c(-Inf, 0, Inf), labels = c("+", "-")),
    factor = factor(sign(x), levels = c(-1, 1), labels = c("-", "+")),
    ifelse_direct = factor(ifelse(x < 0, -1, 1), levels = c("-", "+")),
    ifelse_sign = factor(ifelse(sign(x) == -1, "+", "-"), levels = c("-", "+")),
    vector_index = factor(c("+", "-"), levels = c("-", "+"))[(x < 0) + 1L],
    double_sub = factor(sub("1", "+", sub("-1", "-", sign(x))), levels = c("-", "+")),
    times = 10
), order = "mean")
# Unit: microseconds
#           expr      min       lq      mean    median       uq      max neval   cld
#   vector_index   22.968   24.742   29.5399   26.5030   33.719   41.736    10 a    
#    ifelse_sign  205.342  206.831  214.7748  211.4585  217.641  237.253    10  b   
#            cut  203.333  228.458  242.2857  234.2420  255.290  324.423    10  b   
#         factor  516.720  519.264  539.4255  524.8190  541.624  609.298    10   c  
#  ifelse_direct  568.426  570.917  575.7954  573.8430  577.363  599.899    10    d 
#     double_sub 1316.820 1320.598 1333.2738 1326.0780 1343.518 1363.342    10     e