Question

该程序创建一个非常大的集合来查找散列函数冲突。有没有办法减少在GC中花费的时间？ + RTS -s报告在GC中花费了40 +％的时间。

使用示例：

./program 0 1000000 +RTS -s
./program 145168473 10200000 +RTS -s

我可以使用更好的算法或数据结构吗？

{-# LANGUAGE OverloadedStrings #-}

import System.Environment
import Control.Monad
import Crypto.Hash.SHA256

import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as BL
import Data.Char
import Data.Int
import Data.Bits
import Data.Binary
import Data.Set as Set
import Data.List
import Numeric

str2int :: (Integral a) => B.ByteString -> a
str2int bs = B.foldl (\a w -> (a * 256)+(fromIntegral $ ord w)) 0 bs

t50 :: Int64 -> Int64
t50 i = let h = hash $ B.concat $ BL.toChunks $ encode i
        in
          (str2int $ B.drop 25 h) .&. 0x3ffffffffffff

sha256 :: Int64 -> B.ByteString
sha256 i = hash $ B.concat $ BL.toChunks $ encode i

-- firstCollision :: Ord b => (a -> b) -> [a] -> Maybe a
firstCollision f xs = go f Set.empty xs
  where
    -- go :: Ord b => (a -> b) -> Set b -> [a] -> Maybe a
    go _ _ []     = Nothing
    go f s (x:xs) = let y = f x
                    in
                      if y `Set.member` s
                        then Just x
                        else go f (Set.insert y s) xs

showHex2 i
  | i < 16    = "0" ++ (showHex i "")
  | otherwise = showHex i ""

prettyPrint :: B.ByteString -> String
prettyPrint = concat . (Data.List.map showHex2) . (Data.List.map ord) . B.unpack


showhash inp =
  let  h = sha256 inp
       x = B.concat $ BL.toChunks $ encode inp
   in do putStrLn $ "  - input: " ++ (prettyPrint x) ++ " -- " ++ (show inp)
         putStrLn $ "  -  hash: " ++ (prettyPrint h)

main = do
         args <- getArgs
         let a = (read $ args !! 0) :: Int64
             b = (read $ args !! 1) :: Int64
             c = firstCollision t [a..(a+b)]
             t = t50
         case c of
           Nothing -> putStrLn "No collision found"
           Just x  -> do let h = t x
                         putStrLn $ "Found collision at " ++ (show x)
                         showhash x
                         let first = find (\x -> (t x) == h) [a..(a+b)]
                          in case first of
                               Nothing -> putStrLn "oops -- failed to find hash"
                               Just x0 -> do putStrLn $ "first instance at " ++ (show x0)
                                             showhash x0

Answer 1

正如您所注意到的那样，GC统计数据报告生产率低：

  44,184,375,988 bytes allocated in the heap
   1,244,120,552 bytes copied during GC
      39,315,612 bytes maximum residency (42 sample(s))
         545,688 bytes maximum slop
             109 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     81400 colls,     0 par    2.47s    2.40s     0.0000s    0.0003s
  Gen  1        42 colls,     0 par    1.06s    1.08s     0.0258s    0.1203s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    4.58s  (  4.63s elapsed)
  GC      time    3.53s  (  3.48s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    8.11s  (  8.11s elapsed)

  %GC     time      43.5%  (42.9% elapsed)

  Alloc rate    9,651,194,755 bytes per MUT second

  Productivity  56.5% of total user, 56.4% of total elapsed

最明显的第一步是增加GC默认区域以尝试消除调整大小的需要。一招，例如is to increase the -A area（您可以使用工具like GC tune为您的程序找到正确的设置。

  $ ./A ... +RTS -s -A200M

  Total   time    7.89s  (  7.87s elapsed)

  %GC     time      26.1%  (26.5% elapsed)

  Alloc rate    7,581,233,460 bytes per MUT second

  Productivity  73.9% of total user, 74.1% of total elapsed

所以我们减少了四分之一秒，但生产率提高到75％。现在我们应该看一下堆配置文件：

enter image description here

显示集合及其Int值的线性增长。这是你的算法所指定的，所以只要你保留所有的结果，我就看不到你能做的很多。

Answer 2

您正在做的一件事是通过使用ByteString包来构建binary（如果您想避免使用cereal，可以使用Builder /从懒惰的块）。如果你深入研究他们使用的binary monad的内部，你可以看到它的默认初始大小约为32k。为了您的目的，考虑到您只需要8个字节，这可能会给垃圾收集器带来更多压力。

由于您实际上只是使用encodeInt64 :: Int64 -> B.ByteString encodeInt64 x = let go :: Int -> Maybe (Word8, Int) go i | i < 0 = Nothing | otherwise = let w :: Word8 w = fromIntegral (x `shiftR` i) in Just (w, i-8) in fst $ B.unfoldrN 8 go 56进行编码，因此您可以使用以下内容自行完成：

Data.Set

我担心你甚至可能做得更好，或者把字节直接戳到缓冲区。

以上是一回事，另一个非GC相关的问题是，您使用的是标准Data.HashSet实施，您可以通过unordered-containers -A200M找到稍微更好的效果

Don提到的最后一点是，您可以请求Data.HashSet（或者约会）更大的分配区域。

通过上述所有修改（您自己的编码器，使用-A200M和{{1}}），您的代码的运行时间从我的机器上的7.397s到3.474s，％GC时间为分别为52.9％和21.2％。

所以在你的方法的Big-O意义上你没有做错什么，但是有一些常数可以让你失望一点！

Answer 3

我不确定。但是，这里有一些分析器输出，以防有人可以从中构建一个真正的答案：

这是堆配置文件（来自与+RTS -hT一起运行）

heap profile

由于对firstCollision进行了非强制评估，我认为你在Set.insert中正在积累thunk。但是，内存分配在绝对意义上是如此之小，以至于我不确定它是真正的罪魁祸首 - 见下文。

以下是探查器的输出（使用-prof -fprof-auto编译，使用+RTS -p运行）：

COST CENTRE         MODULE  %time %alloc

firstCollision.go   Main     49.4    2.2
t50.h               Main     39.5   97.5
str2int             Main      5.4    0.0
firstCollision.go.y Main      3.4    0.0
t50                 Main      1.1    0.0

基本上所有内存分配都来自序列化/散列管道h的本地等效sha256，其中似乎有很多中间数据结构构建正在进行中。

任何经验丰富的人都可以更准确地找出问题吗？

创建一个大型集合 - 需要减少在GC中花费的时间

3 个答案: