使基于Attoparsec的解析器更有效

时间:2014-12-02 10:40:07

标签: parsing haskell attoparsec

我使用Attoparsec编写了一个简单的文本STL(标准细分库)解析器。 STL包含一组方面。每个构面包含一个法线和三角形顶点。典型的STL文件可能很大(约100 Mb或更多)。

STL文件格式是

solid
 facet normal nx ny nz
   outer loop
       vertex x1 y1 z1
       vertex x2 y2 z2
       vertex x3 y3 z3 
 endfacet
endsolid

解析器代码在这里。可以在STLReader

找到完整的实施
-- | Point represents a vertex of the triangle
data Point a = Point !a !a !a deriving Show

-- | Vector for a given class
data Vector a = Vector !a !a !a deriving Show

-- | Parse coordinate triplet. 
coordinates :: (Fractional a) => Text -> (a -> a -> a -> b) -> Parser b
coordinates s f = do
  skipSpace
  string s
  !x <- coordinate
  !y <- coordinate
  !z <- coordinate
  return $! f x y z
  where
    coordinate = skipWhile isHorizontalSpace *> fmap realToFrac double
{-# INLINE coordinates #-}

type RawFacet a = (Vector a, Point a, Point a, Point a)

-- | Parse a facet. The facet comprises of a normal, and three vertices
facet  :: Fractional a => Parser (RawFacet a)
facet = (,,,) <$> beginFacet
          <* (skipSpace     *> "outer loop")
          <*> vertexPoint
          <*> vertexPoint
          <*> vertexPoint
          <*  (skipSpace <* "endloop" <* endFacet )
          <?> "facet"
  where
    beginFacet  = skipSpace <* "facet" *> coordinates "normal" Vector
    endFacet    = skipSpace <* string "endfacet"
    vertexPoint = coordinates "vertex" Point
{-# INLINE facet #-}

rawFacets :: Fractional a => Parser [RawFacet a]
rawFacets = beginSolid *> many' facet <* endSolid
      where
            solidName  = option "default" (skipWhile isHorizontalSpace *> fmap T.pack (many1 $ satisfy isAlphaNum) )
            beginSolid = skipSpace <* "solid" *> solidName <?> "start solid"
            endSolid   = skipSpace <* "endsolid" <?> "end solid"

-- | Read text STL file. STL extensions for color etc. are not supported in this version. 
readTextSTL :: Fractional a => FilePath -> IO (Either String [RawFacet a])
readTextSTL path = liftM (Al.eitherResult . Al.parse rawFacets) (TIO.readFile path)

main :: IO Int
main = do
  (path:_) <- getArgs
  putStrLn $ "Parsing STL file: " ++ path
  s <- readTextSTL path
  putStrLn "Parsing complete"
  case s of
   Left error -> putStrLn error
   Right s    -> putStrLn $ "Num facets : " ++ show (length s)
  return 0

我使用Meshlab提供的'c'解析器对此代码进行基准测试。当我用69Mb扫描测试时,meshlab在大约13秒内完成了这项工作,而用attoparsec花了22秒。因此虽然attoparsec使我能够比Parsec更快地解析(parsec大约是36秒),但我还有很长的路要走。

如何进一步改进此解析器?

0 个答案:

没有答案