Haskell:从列表中删除重复的元组?

时间:2015-09-21 01:20:20

标签: list haskell tuples

我正试图从之前到之后的状态。是否有方便的Haskell函数从列表中删除重复的元组?或者它可能有点复杂,例如遍历整个列表?

Before: the list of tuples, sorted by word, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]

2 个答案:

答案 0 :(得分:6)

hoogle上搜索Eq a => [a] -> [a],返回nub函数:

  

nub函数从列表中删除重复的元素。特别是,它只保留每个元素的第一次出现。 (名称nub的意思是“本质'。”

与文档中一样,更一般的情况是nubBy

那就是说,这是一个O(n^2)算法,可能效率不高。如果值是Ord类型类的实例,则可以使用Data.Set.fromList,如下所示:

import qualified Data.Set as Set

nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList

虽然这将维持原始列表的顺序。

维护原始列表的顺序的简单设置样式解决方案可以是:

import Data.Set (Set, member, insert, empty)

nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
    where
    loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
    loop acc@(xs, obs) x
        | x `member` obs = acc
        | otherwise = (x:xs, x `insert` obs)

答案 1 :(得分:4)

如果您要为nub定义Ord版本,建议您使用

nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
  where
    go x r obs
      | x `member` obs = r obs
      | otherwise = obs' `seq` x : r obs'
      where obs' = x `insert` obs

要了解这是做什么的,你可以摆脱foldr

nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
  where
    nub'' [] obs = obs `seq` []
    nub'' (y : ys) obs
      | y `member` obs = nub'' ys obs
      | otherwise = obs' `seq` y : nub'' ys obs'
      where obs' = y `insert` obs

关于此实现的一个关键点,而不是 behzad.nouri,是因为它们被消耗,它会懒洋洋地产生元素。这对于缓存利用率和垃圾收集来说通常要好得多,并且使用比反转算法更少的常量因子内存。