使用八进制转义序列解码文本输入

时间:2013-02-04 10:22:42

标签: haskell utf-8 character-encoding

我有一个输入,其中“Divinités”(9个字符)将表示为“Divinit \ 303 \ 251s”(实际文本数据长度为16个字符),如何将其转换为Haskell的正确编码Text (或ByteStringString)?

1 个答案:

答案 0 :(得分:2)

首先,您需要将每个转义序列的字符串转换为一个Char。然后使用utf8-string包将结果解码为实际的utf8字符串。

import Data.Char
import Codec.Binary.UTF8.String (decodeString)

input :: String
input = "Divinit\\303\\251s"

main = maybe (return ()) putStrLn $ convertString input

convertString :: [Char] -> Maybe [Char]
convertString = fmap decodeString . unescape

unescape :: [Char] -> Maybe [Char]
unescape [] = Just []
unescape ('\\' : tail) = do
  headResult <- fmap toEnum . octalDigitsToInt . take 3 $ tail
  tailResult <- unescape . drop 3 $ tail
  return $ headResult : tailResult
unescape (head : tail) = fmap (head :) . unescape $ tail

octalDigitsToInt :: [Char] -> Maybe Int
octalDigitsToInt = 
  fmap sum . sequence .
    map (\(i, c) -> fmap (8^i*) $ octalDigitToInt c) .
      zip [0..] . reverse

octalDigitToInt :: Char -> Maybe Int
octalDigitToInt c | isOctDigit c = Just $ digitToInt c
octalDigitToInt _ = Nothing