Question

我们正在项目中进行一些性能优化，并且使用分析器我得到了以下方法：

private int CalculateAdcValues(byte lowIndex)
{
    byte middleIndex = (byte)(lowIndex + 1);
    byte highIndex = (byte)(lowIndex + 2);

    // samples is a byte[]
    retrun (int)((int)(samples[highIndex] << 24) 
        + (int)(samples[middleIndex] << 16) + (int)(samples[lowIndex] << 8));
}

这种方法已经非常快，每次执行约1μs，但它被称为每秒约100.000次，因此需要大约10％的CPU。

是否有人知道如何进一步改进此方法？

编辑：

当前解决方案：

fixed (byte* p = samples)
{
    for (; loopIndex < 61; loopIndex += 3)
    {
        adcValues[k++] = *((int*)(p + loopIndex)) << 8;
    }
}

这需要<40％的时间（之前的“整个方法”每次呼叫需要约35μs，现在需要~13μs）。 for - 循环实际需要花费更多时间来计算...

Answer 1

我强烈怀疑在转换为byte之后，无论如何都要将索引转换回int以用于数组索引操作。这将是便宜的，但可能不完全免费。因此，除非您使用转换为byte以有效地使索引在0..255范围内，否则摆脱强制转换。此时，您也可以摆脱单独的局部变量。

此外，由于仅在int及更高类型上定义了转换操作，因此您对<{1}}的强制转换是无操作的。

最后，使用int可能比|：

更快

（为什么底部8位中没有任何内容？这是故意的吗？请注意，如果private int CalculateAdcValues(byte lowIndex) { return (samples[lowIndex + 2] << 24) | (samples[lowIndex + 1] << 16) | (samples[lowIndex] << 8); }设置了最高位，结果将最终为负数 - 是否可以？）

Answer 2

看到您有友好的结局，请转unsafe

unsafe int CalculateAdcValuesFast1(int lowIndex)
{
  fixed (byte* p = &samples[lowIndex])
  {
    return *(int*)p << 8;
  }
}

在x86上快了大约30％。没有像我希望的那么多。在x64上大约40％。

正如@CodeInChaos所建议的那样：

  var bounds = samples.Length - 3;
  fixed (byte* p = samples)
  {
    for (int i = 0; i < 1000000000; i++)
    {
      var r = CalculateAdcValuesFast2(p, i % bounds); // about 2x faster
      // or inlined:
      var r = *((int*)(p + i % bounds)) << 8; // about 3x faster
      // do something
    }
  }


unsafe object CalculateAdcValuesFast2(byte* p1, int p2)
{
  return *((int*)(p1 + p2)) << 8;
}

Answer 3

可能会更快一点。我已将转换删除为整数。

        var middleIndex = (byte)(lowIndex + 1);
        var highIndex = (byte)(lowIndex + 2);

        return (this.samples[highIndex] << 24) + (this.samples[middleIndex] << 16) + (this.samples[lowIndex] << 8);

加速字节解析可能吗？

3 个答案: