在Haskell中对地图进行硬编码的最有效方法是什么?

时间:2014-05-19 22:44:01

标签: haskell

我想在Haskell中硬编码地图。我至少可以看到三种方法:

  • 使用多个方程:

    message 200 = "OK"
    message 404 = "Not found"
    ...
    
  • 使用case表达式:

    message s = case s of
        200 -> "OK"
        404 -> "Not found"
    
  • 实际使用Map

哪种方法最有效呢?一种解决方案比其他解决方案更快,为什么? 前两个解决方案是否相同? (编译器会生成相同的代码吗?) 推荐的方式是什么(更容易阅读)?

(请注意,我在我的示例中使用Int,但这不是必需的。键也可能是String s所以我对这两种情况都感兴趣。)

3 个答案:

答案 0 :(得分:6)

Int上的模式匹配发生在O(log(n))时间,就像地图查找一样。

考虑以下代码,按ghc -S

编译为x86程序集
module F (
    f
) where

f :: Int -> String
f 0 = "Zero"
f 1 = "One"
f 2 = "Two"
f 3 = "Three"
f 4 = "Four"
f 5 = "Five"
f 6 = "Six"
f 7 = "Seven"
f _ = "Undefined"

编译的汇编代码是

.text
    .align 4,0x90
    .long   _F_f_srt-(_sl8_info)+0
    .long   0
    .long   65568
_sl8_info:
.Lcma:
    movl 3(%esi),%eax
    cmpl $4,%eax
    jl .Lcmq
    cmpl $6,%eax
    jl .Lcmi
    cmpl $7,%eax
    jl .Lcme
    cmpl $7,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_cm7_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmc:
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clB_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcme:
    cmpl $6,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_cm3_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmg:
    cmpl $4,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clV_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmi:
    cmpl $5,%eax
    jl .Lcmg
    cmpl $5,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clZ_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmk:
    cmpl $2,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clN_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmm:
    testl %eax,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clF_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmo:
    cmpl $1,%eax
    jl .Lcmm
    cmpl $1,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clJ_str,0(%ebp)
    jmp _stg_ap_n_fast
.Lcmq:
    cmpl $2,%eax
    jl .Lcmo
    cmpl $3,%eax
    jl .Lcmk
    cmpl $3,%eax
    jne .Lcmc
    movl $_ghczmprim_GHCziCString_unpackCStringzh_closure,%esi
    movl $_clR_str,0(%ebp)
    jmp _stg_ap_n_fast
.text
    .align 4,0x90
    .long   _F_f_srt-(_F_f_info)+0
    .long   65541
    .long   0
    .long   65551
.globl _F_f_info
_F_f_info:
.Lcmu:
    movl 0(%ebp),%esi
    movl $_sl8_info,0(%ebp)
    testl $3,%esi
    jne .Lcmx
    jmp *(%esi)
.Lcmx:
    jmp _sl8_info

这是对整数参数进行二进制搜索。 .Lcma分支在< 4然后< 6然后< 7。第一次比较转到.Lcmq,其分支在< 2然后< 3。第一个比较是.Lcmo,分支在< 1。

使用ghc -O2 -S,我们得到了这个,我们可以看到相同的模式:

.text
    .align 4,0x90
    .long   _F_zdwf_srt-(_F_zdwf_info)+0
    .long   65540
    .long   0
    .long   33488911
.globl _F_zdwf_info
_F_zdwf_info:
.LcqO:
    movl 0(%ebp),%eax
    cmpl $4,%eax
    jl .Lcr6
    cmpl $6,%eax
    jl .LcqY
    cmpl $7,%eax
    jl .LcqU
    cmpl $7,%eax
    jne .LcqS
    movl $_F_f1_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.LcqS:
    movl $_F_f9_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.LcqU:
    cmpl $6,%eax
    jne .LcqS
    movl $_F_f2_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.LcqW:
    cmpl $4,%eax
    jne .LcqS
    movl $_F_f4_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.LcqY:
    cmpl $5,%eax
    jl .LcqW
    cmpl $5,%eax
    jne .LcqS
    movl $_F_f3_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.Lcr0:
    cmpl $2,%eax
    jne .LcqS
    movl $_F_f6_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.Lcr2:
    testl %eax,%eax
    jne .LcqS
    movl $_F_f8_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.Lcr4:
    cmpl $1,%eax
    jl .Lcr2
    cmpl $1,%eax
    jne .LcqS
    movl $_F_f7_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.Lcr6:
    cmpl $2,%eax
    jl .Lcr4
    cmpl $3,%eax
    jl .Lcr0
    cmpl $3,%eax
    jne .LcqS
    movl $_F_f5_closure,%esi
    addl $4,%ebp
    andl $-4,%esi
    jmp *(%esi)
.section .data
    .align 4
.align 1
_F_f_srt:
    .long   _F_zdwf_closure
.data
    .align 4
.align 1
.globl _F_f_closure
_F_f_closure:
    .long   _F_f_info
    .long   0
.text
    .align 4,0x90
    .long   _F_f_srt-(_srh_info)+0
    .long   0
    .long   65568
_srh_info:
.Lcrv:
    movl 3(%esi),%eax
    movl %eax,0(%ebp)
    jmp _F_zdwf_info
.text
    .align 4,0x90
    .long   _F_f_srt-(_F_f_info)+0
    .long   65541
    .long   0
    .long   65551
.globl _F_f_info
_F_f_info:
.Lcrz:
    movl 0(%ebp),%esi
    movl $_srh_info,0(%ebp)
    testl $3,%esi
    jne _srh_info
    jmp *(%esi)

如果我们将原始代码更改为

f :: Int -> String
f 1 = "Zero"
f 2 = "One"
f 3 = "Two"
f 4 = "Three"
f 5 = "Four"
f 6 = "Five"
f 7 = "Six"
f 8 = "Seven"
f _ = "Undefined"

分支是< 5,< 7,< 8,< 5< 3< 4< 4等等,所以它可能基于对参数进行排序而这样做。我们可以通过加扰数字,甚至在它们之间增加间距来测试:

f :: Int -> String
f 20 = "Zero"
f 80 = "One"
f 70 = "Two"
f 30 = "Three"
f 40 = "Four"
f 50 = "Five"
f 10 = "Six"
f 60 = "Seven"
f _ = "Undefined"

果然,分支仍在<50,<70,<80,<50,<30,<40等

答案 1 :(得分:3)

Apparently函数模式匹配发生在O(1)(常量时间),而Map的查找(当然是指Data.Map)保证为{{1 }}

考虑到上述假设,我会选择模式匹配:

O(logn)

答案 2 :(得分:1)

case ... of和多个方程完全相同。他们编译到同一个核心。对于大多数情况,你应该这样做:

import qualified Data.Map as Map

message =
    let
        theMap = Map.fromList [ (200, "OK"), (404, "Not found"), ... ]
    in
        \x -> Map.lookup x theMap

这只构造一次地图。如果您不喜欢Maybe String返回类型,则可以将fromMaybe应用于结果。

对于少数情况(特别是如果它们是整数),如果编译器可以将它转换为跳转表,则case语句可能更快。

在理想的世界中,ghc会自动选择正确的版本。