Question

我编写了一个程序集程序，它在数组中找到最大值，但现在我希望它能找到数组中的第二大数字。如何修改程序来执行此操作？

这是我写的程序，它确实有用。程序打印数组中的所有值，然后查找数组的最大值。现在我希望它找到第二大值。

library(quanteda)
myCorpus <- corpus(data_char_ukimmig2010)
system.time(theDFM <- dfm(myCorpus,tolower=TRUE,
                      remove=c(stopwords(),",",".","-","\"","'","(",")",";",":")))
system.time(textFreq <- textstat_frequency(theDFM))

hist(textFreq$frequency,
     main="Frequency Distribution of Words: UK 2010 Election Manifestos")

top20 <- textFreq[1:20,]
barplot(height=top20$frequency,
        names.arg=top20$feature,
        horiz=FALSE,
        las=2,
        main="Top 20 Words: UK 2010 Election Manifestos")

我编辑了我的程序以跟踪第二个最大值，我收到的第二个最大值的结果是3.我当时希望程序输出5.我不知道为什么我得到错误的输出。以下是我对该计划的编辑。

 %include "io.mac"
.STACK 100H 

.DATA
   Numbers DW 3,4,5,2,6,0
   msg0  db "Printing values in array",0
   msg1  db "Max",0
   msg2  db "Min",0

   .CODE
        .STARTUP
    PutStr msg0
    mov dx, Numbers 
    mov si, dx ;point to array
    printValues:
    cmp word [si], 0
    je procedure
    nwln
    PutInt [si]
    add si, 2
    jmp printValues

    procedure:
    push Numbers ;push Number to stack to pass parameter by stack
    call maxMeth
    nwln
    PutStr msg1
    nwln

    PutInt ax
    nwln




    complete:
.EXIT





maxMeth:
    enter 0,0 ;save old bp and set bp to sp
    mov si, [bp+4] ;point to array 
    mov ax, [si]   ; ax holds max
    add si,2 ; Increment si to next number

;Now entering loop
max:   
    cmp word [si],0   ; checks to see if the number is 0 and if it is, then we are done.
    je finish
    cmp ax, [si]        ; ax holds the max . So if ax is greater than si, then dont assign si to ax. 
    jge increment
    jmp newMax
newMax: 
    mov ax, [si] ; Otherwise we have a new a max

increment:   
    add si, 2   ;now increment si to check the next value and jump back to the main loop.
    jmp max

finish: ;clean up.
    leave ;pop bp
    ret   ;return

Answer 1

我最终为此编写代码，因为它让我想知道如何有效地实现循环。如果你想自己解决你的作业，不要看我的代码，只看英文的要点。使用调试器单步执行代码。

以下 I 将如何更改您的代码：

NASM样式：在函数中使用本地标签（如.noswap:）。将操作数缩进到一致的列，使其看起来不整齐。使用输入/返回值和调用约定来注释您的函数（将其命名为clobbers）。
在jmp next_instruction之前优化newMax:，因为它只是一个昂贵的无操作跳转，无论如何都会执行。
除非对真正的8086进行优化，否则不要使用enter，但速度很慢。
将正在检查的每个元素加载到寄存器中，而不是多次使用相同的内存操作数。（x86-16有6个整数寄存器而不是BP / SP;使用它们。）
将循环出口条件分支放在底部。（如果需要，跳转到那里循环入口点。）
在两个寄存器中保留最大值和第二个最大值，就像您在AX中保持最大值一样。
如果您发现元素大于2nd-max，请保留3个数字中最高的2个。即在2个寄存器中维护2个元素的队列/排序列表。

未测试：

; word max2Meth(word *array);
; Input: implicit-length array (terminated by a 0 element),
;        pointed to by pointer passed on the stack.  (DS segment)
; returns in ax
; clobbers: cx, dx
global max2Meth
max2Meth:
    push  bp
    mov   bp, sp     ; make a stack frame.  (ENTER is slow, don't use it)
    push  si         ; SI is call-preserved in many calling conventions.  Omit this if you want to just clobber it.

    mov   si, [bp+4] ; pointer to array 

    mov   ax, [si]   ; assume that the list is non-empty
    mov   dx, ax     ; start with max = max2 instead of putting a conditional xchg outside the loop

    jmp   .loop_entry   ; enter at the bottom, at the conditional branch
;;; ax: 2nd max
;;; dx: max

.maxloop:              ; do {
    cmp cx, ax         ; check against 2nd-max, because the common case is less than both.
    jg  .updateMaxes   ; optimize for the common case: fall through on not-found

.loop_entry:
    add  si, 2
    mov  cx, [si]      ;   c = *p++;
    test cx, cx
    jnz .maxloop       ; } while (c != 0);

.finish:
   ; 2nd-max is already in AX, just clean up and return

    pop  si
    leave    ;pop bp would be faster because SP is already pointing to the right place
    ret

; This block is part of the function, even though it's placed after the ret
.updateMaxes:
    mov  ax, cx           ; Bump the old 2nd-max unconditionally
    cmp  ax, dx
    jle  .loop_entry      ; if 2nd_max <= max, return to the loop
    xchg ax, dx           ; otherwise swap
    jmp  .loop_entry

将一个罕见的情况下的块放在线外是很好的，因为这意味着普通情况可以通过而没有采取分支。经常把if / else条件内联需要{ {1}}某处只是为了避免在if之后运行else部分。但是jmp最终变得相当优雅，IMO：它必须跳回到循环中，我们可以在交换之前或之后做到这一点。

16-bit xchg is 3 uops（与3 .updateMaxes:条指令一样昂贵），但为了在新的最大案例中降低成本（并且只做mov / mov ax, dx）我们＆＃ 39; d必须使新的2ndmax案例变慢。假设它更有可能只更新第二个最大值而不是更新两者，我认为只需要mov和cmp / jcc即可获胜。您可以使用mov dx, cx（在686 CPU上）使该部分无分支，这可能是好的。让整个无分支的东西会给你一个很长的依赖链，并且不值得，除非数组元素平均变得越来越大，所以你总是频繁地进行最大更新（但没有模式，所以你得分支未命中。）

在Intel Haswell / Skylake上，内环只有4个融合域uops（比较/分支都可以宏融合）。在长时间没有更新的情况下，它应该每个时钟以1次迭代运行。

如果您针对代码大小超速进行优化（例如，对于真实的8086），请使用cmov作为临时和ax而不是lodsw和{{1 }}。（将mov ax, [si]保存在其他注册表中。

使用隐式长度列表，您不能在add si, 2中使用max和第二个最大值，因为您需要检查0以及＆gt; 2nd-max：/

作为进一步优化，如果您使用小型号（SS = DS），则可以使用scasw代替ax，因为您不需要访问加载指针后的堆栈。您可以bp代替si

在我想到使用ax = dx = first element进入循环之前，我打算在循环之前使用这段代码：

pop bp

构建内循环的另一种方法是这样的：

leave

这可能更好，因为我们可以直接进入循环的顶部开始。我们跳过循环，mov ax, [si] ; assume that the list is non-empty mov dx, [si+2] ; but check if it's only 1 element long, like maxMeth does test dx, dx jz .finish add si, 4 ; first 2 elements are loaded cmp ax, dx ; sort them so ax <= dx jng .noswap xchg ax, dx .noswap:跳过了条件更新的东西，所以我们没有任何分支存在只是为了跳过代码中的＃34;方式＆＃34 ;. （即我们已成功有效地布局我们的代码块。）

某些CPU的唯一缺点是test / jz和cmp / jg是背靠背的。当条件分支被多于几个指令分开时，一些CPU会做得更好。（例如，除非你对Sandybridge上的解码器如何击中它们感到幸运，否则两个分支中的一个不会产生宏观融合。但它们会在第一个循环中出现。）

提醒：Stack Overflow用户贡献在cc by-sa 3.0下获得许可，且需要归属，因此如果您复制粘贴我的整个代码，请确保在评论中包含.maxloop: ; do { add si, 2 mov cx, [si] ; c = *p++; test cx, cx jz .finish ; jz instead of jnz: exit the loop on the sentinel value cmp cx, ax ; check against 2nd-max, because the common case is less than both. jng .maxloop ;; .updateMaxes: ;; conditionally fall through into this part mov ax, cx ; Bump the old 2nd-max unconditionally cmp ax, dx jle .maxloop ; if 2nd_max <= max, return to the loop xchg ax, dx ; otherwise swap jmp .maxloop .finish:。

在程序集中找到第二大值的程序集

1 个答案: