Question

通常，在位图中绘制矩形的好方法是循环2个边界尺寸并设置单个像素。例如，在伪代码上：

drawRect(array, color, x, X, y, Y):
    for x from x til X:
        for y from y til Y:
            array[x,y] = color

Haskell的REPA的等价物是什么？

Answer 1

在制作新数组的普通REPA机制中，将数组一次复制到外部存储器时，制作新的延迟数组最快。使用REPA的实际性能取决于您对阵列的处理方式。

让我们定义一个计算类型，它只取决于数组中的位置和该位置的当前值。

{-# LANGUAGE ScopedTypeVariables #-}

import Data.Array.Repa hiding ((++))
import Data.Array.Repa.Repr.ForeignPtr

import Data.Word
import Control.Monad
import Data.Time.Clock
import System.Mem

type Ghost sh a b = sh -> a -> b

我们可以定义任何形状的填充物。

fill :: Shape sh => sh -> sh -> a -> Ghost sh a a
fill from to color = go
    where
        {-# INLINE go #-}
        go sh a =
            if inShapeRange from to sh
            then color
            else a

我们将使用三种不同的方式来定义一个新数组 - 延迟数组，结构化遍历和非结构化遍历。

最简单的延迟是fromFunction。

ghostD :: (Shape sh, Source r a) => Ghost sh a b -> Array r sh a -> Array D sh b
ghostD g a = fromFunction (extent a) go
    where
        {-# INLINE go #-}
        go sh = g sh (a ! sh)

结构化遍历可以利用了解底层数组表示的结构。不幸的是，我们在结构化遍历中获取有关位置的信息的唯一方法是使用szipWith来压缩一个已经包含位置信息的数组。

ghostS :: (Shape sh, Structured r1 a b, Source r1 a) => Ghost sh a b -> Array r1 sh a -> Array (TR r1) sh b
ghostS g a = szipWith ($) ghost a
    where
        ghost = fromFunction (extent a) g

非结构化遍历与fromFunction构建的延迟数组非常相似;它还会返回Array D。

ghostT :: (Shape sh, Source r a) => Ghost sh a b -> Array r sh a -> Array D sh b
ghostT g a = traverse a id go
    where
        {-# INLINE go #-}
        go lookup sh = g sh (lookup sh)

通过一些非常天真的基准测试，我们可以运行它们，看看它们有多快。我们在测量时间之前执行垃圾收集以尝试获得可靠的计时结果。我们将有两个基准。对于每种机制，我们将运行一步将结果写入内存10次。然后我们将组成101个相同的步骤，将结果写入内存一次。

bench :: Int -> String -> IO a -> IO ()
bench n name action = do
    performGC
    start <- getCurrentTime
    replicateM_ n action    
    performGC
    end <- getCurrentTime
    putStrLn $ name ++ " " ++ (show (diffUTCTime end start / fromIntegral n))

iterN :: Int -> (a -> a) -> (a -> a)
iterN 0 f = id
iterN n f = f . iterN (n-1) f

main = do
    (img :: Array F DIM2 Word32) <- computeP (fromFunction (Z :. 1024 :. 1024 ) (const minBound))
    let (Z :. x :. y ) = extent img
        drawBox = fill (Z :. 20 :. 20 ) (Z :. x - 20 :. y - 20 ) maxBound

    bench 10 "Delayed      10x1" ((computeP $ ghostD drawBox img) :: IO (Array F DIM2 Word32))
    bench 10 "Unstructured 10x1" ((computeP $ ghostT drawBox img) :: IO (Array F DIM2 Word32))
    bench 10 "Structured   10x1" ((computeP $ ghostS drawBox img) :: IO (Array F DIM2 Word32))

    bench 1 "Delayed      1x101" ((computeP $ (iterN 100 (ghostD drawBox)) . ghostD drawBox $ img) :: IO (Array F DIM2 Word32))
    bench 1 "Unstructured 1x101" ((computeP $ (iterN 100 (ghostT drawBox)) . ghostT drawBox $ img) :: IO (Array F DIM2 Word32))
    bench 1 "Structured   1x101" ((computeP $ (iterN 100 (ghostS drawBox)) . ghostS drawBox $ img) :: IO (Array F DIM2 Word32))

结果时间是通过写入外部存储器强制数组的次数的平均值。这些结果是我机器上多次运行的典型结果。

Delayed      10x1 0.0234s
Unstructured 10x1 0.02652s
Structured   10x1 0.02652s
Delayed      1x101 0.078s
Unstructured 1x101 0.0936s
Structured   1x101 0.2652s

结果似乎并不取决于运行基准测试的顺序。

Structured   10x1 0.03276s
Unstructured 10x1 0.02652s
Delayed      10x1 0.01716s
Structured   1x101 0.2184s
Unstructured 1x101 0.1092s
Delayed      1x101 0.0624s

这些结果表明你可以进行一些全数组计算，并且仍然可以通过内存访问来控制结果。

通过绘制场景来渲染场景的库通常具有与REPA非常不同的结构，REPA主要用于并行处理所有数据的数据处理任务。绘图和渲染库通常使用称为scene graph的场景元素的图形或树，允许它们快速剔除不会在图像或图像的一部分中绘制的元素。如果您可以快速剔除不影响特定像素的所有内容，则无需改变结果即可获得良好的性能。

在REPA数组中绘制矩形的最快方法是什么？

1 个答案: