Question

来自http://www.geeksforgeeks.org/merge-sort-for-linked-list/

链表的慢随机访问性能会产生其他一些算法（如快速排序）表现不佳，而其他算法（如完全不可能。

但是，我真的不明白为什么快速排序在排序链表时比合并排序更差。

快速排序：

选择一个数据透视表需要随机访问，并需要遍历链表（每次递归O（n））。

可以使用从左到右的扫描方式进行分区（不需要随机访问）：

合并排序：

在中间拆分需要随机访问，需要遍历链表（使用快慢指针机制）（每次递归O（n））。

合并可以采用从左到右的扫描方式（不需要随机访问）。

据我所知，快速排序和合并排序都需要在每次递归中随机访问，我不明白为什么Quick Sort的性能会比Merge Sort更差，因为Linked List的无随机访问性质。

我在这里错过了什么吗？

编辑：我正在看分区函数，其中pivot是最后一个元素，我们顺序从lwft扫描。如果分区的工作方式不同（即枢轴位于中间，并且每端保持两个指针），如果链接列表是双重链接，它仍然可以正常工作......

Answer 1

您可以使用常量额外内存在线性时间内通过pivot元素拆分列表（即使为单链接列表实现起来非常痛苦），因此它将具有与合并排序相同的时间复杂度平均而言（关于合并排序的好处是在最坏的情况下它是O(N log N)）。所以它们在渐近行为方面可以是相同的。

很难判断哪一个更快（因为实际运行时是实现的属性，而不是算法本身）。

然而，使用随机数据透视的分区对于单个链接列表来说是相当混乱的（这是可能的，但我能想到的方法有一个更大的常量，而不仅仅是为合并排序获得两半）。使用第一个或最后一个元素作为数据透视表有一个明显的问题：它在O(N^2)中用于排序（或接近排序）的列表。考虑到这一点，我说在大多数情况下合并排序是更合理的选择。

Answer 2

正如已经指出的，如果使用单个链接列表，则合并排序和快速排序具有相同的平均运行时间：O(n logn)。

我并不是100％确定您考虑的是哪种分区算法，但我可以提出的一种扫描算法会删除列表中的当前元素（如果它大于pivot元素并将其插入到列表的末尾。要进行此更改，至少需要3操作：

必须更改父元素的链接
必须更改最后一个元素的链接
必须更新，谁是最后一个元素

然而，这必须仅在50％的情况下进行，因此在分区函数期间每个元素平均有1.5次更改。

另一方面，在合并功能期间。大约50％的情况，链表中的两个连续元素来自相同的原始链表 - ＆gt;没有什么可做的，因为这些元素已经被链接了。在另一种情况下，我们必须更改链接 - 到另一个列表的头部。平均而言，合并函数的每个元素有0.5个更改。

显然，要知道最终结果的确切操作成本，所以这只是挥手解释。

Answer 3

对于链接列表，有一个迭代的自下而上版本的合并排序，它不扫描列表以拆分它们，这避免了慢随机访问性能的问题。链表的自底向上合并排序使用指向节点的小（25到32）指针数组。时间复杂度为O（n log（n）），空间复杂度为O（1）（指向节点的25到32个指针的数组）。

在该网页

http://www.geeksforgeeks.org/merge-sort-for-linked-list

我发布了一些评论，包括指向链接列表自下而上合并排序的工作示例的链接，但从未收到该组的回复。链接到该网站使用的工作示例：

http://code.geeksforgeeks.org/Mcr1Bf

对于没有随机访问的快速排序，第一个节点可以用作枢轴。将创建三个列表，一个列表用于节点＆lt; pivot，一个节点列表== pivot，一个节点列表＆gt;枢。递归将用于节点的两个列表！= pivot。这具有O（n ^ 2）的最坏情况时间复杂度，以及O（n）的最坏情况堆栈空间复杂度。堆栈空间复杂度可以减少到O（log（n）），只需使用带有节点！= pivot的较短列表上的递归，然后循环返回以使用较长列表的第一个节点作为新枢轴对较长列表进行排序。跟踪列表中的最后一个节点（例如使用指向循环列表的尾指针）将允许快速连接其他两个列表。最坏情况时间复杂度保持在O（n ^ 2）。

应该指出的是，如果你有空间，通常要快速将链表移动到数组（或向量），对数组进行排序，并从排序数组中创建一个新的排序列表。

示例C代码：

#include <stdio.h>
#include <stdlib.h>

typedef struct NODE_{
struct NODE_ * next;
int data;
}NODE;

/* merge two already sorted lists                    */
/* compare uses pSrc2 < pSrc1 to follow the STL rule */
/*   of only using < and not <=                      */
NODE * MergeLists(NODE *pSrc1, NODE *pSrc2)
{
NODE *pDst = NULL;          /* destination head ptr */
NODE **ppDst = &pDst;       /* ptr to head or prev->next */
    if(pSrc1 == NULL)
        return pSrc2;
    if(pSrc2 == NULL)
        return pSrc1;
    while(1){
        if(pSrc2->data < pSrc1->data){  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            if(pSrc2 == NULL){
                *ppDst = pSrc1;
                break;
            }
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            if(pSrc1 == NULL){
                *ppDst = pSrc2;
                break;
            }
        }
    }
    return pDst;
}

/* sort a list using array of pointers to list       */
/* aList[i] == NULL or ptr to list with 2^i nodes    */

#define NUMLISTS 32             /* number of lists */
NODE * SortList(NODE *pList)
{
NODE * aList[NUMLISTS];         /* array of lists */
NODE * pNode;
NODE * pNext;
int i;
    if(pList == NULL)           /* check for empty list */
        return NULL;
    for(i = 0; i < NUMLISTS; i++)   /* init array */
        aList[i] = NULL;
    pNode = pList;              /* merge nodes into array */
    while(pNode != NULL){
        pNext = pNode->next;
        pNode->next = NULL;
        for(i = 0; (i < NUMLISTS) && (aList[i] != NULL); i++){
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if(i == NUMLISTS)   /* don't go beyond end of array */
            i--;
        aList[i] = pNode;
        pNode = pNext;
    }
    pNode = NULL;           /* merge array into one list */
    for(i = 0; i < NUMLISTS; i++)
        pNode = MergeLists(aList[i], pNode);
    return pNode;
}

/* allocate memory for a list */
/* create list of nodes with pseudo-random data */
NODE * CreateList(int count)
{
NODE *pList;
NODE *pNode;
int i;
int r;
    /* allocate nodes */
    pList = (NODE *)malloc(count * sizeof(NODE));
    if(pList == NULL)
        return NULL;
    pNode = pList;                  /* init nodes */
    for(i = 0; i < count; i++){
        r  = (((int)((rand()>>4) & 0xff))<< 0);
        r += (((int)((rand()>>4) & 0xff))<< 8);
        r += (((int)((rand()>>4) & 0xff))<<16);
        r += (((int)((rand()>>4) & 0x7f))<<24);
        pNode->data = r;
        pNode->next = pNode+1;
        pNode++;
    }
    (--pNode)->next = NULL;
    return pList;
}

#define NUMNODES (1024)         /* number of nodes */
int main(void)
{
void *pMem;                     /* ptr to allocated memory */
NODE *pList;                    /* ptr to list */
NODE *pNode;
int data;

    /* allocate memory and create list */
    if(NULL == (pList = CreateList(NUMNODES)))
        return(0);
    pMem = pList;               /* save ptr to mem */
    pList = SortList(pList);    /* sort the list */
    data = pList->data;         /* check the sort */
    while(pList = pList->next){
        if(data > pList->data){
            printf("failed\n");
            break;
        }
    }
    if(pList == NULL)
        printf("passed\n");
    free(pMem);                 /* free memory */
    return(0);
}

Answer 4

扩展了 rcgldr 的答案，我使用第一个元素作为枢轴对链接列表编写了一个简单的¹实现快速排序（在排序后的列表上，其病理表现很差））并在包含伪随机数据的列表上运行基准测试。

我使用递归实现了快速排序，但是通过仅对较小的一半进行递归来避免病理情况下的堆栈溢出。

我还通过指向节点的辅助指针数组实现了建议的替代方案。

代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct NODE {
    struct NODE *next;
    int data;
} NODE;

/* merge two already sorted lists                    */
/* compare uses pSrc2 < pSrc1 to follow the STL rule */
/*   of only using < and not <=                      */
NODE *MergeLists(NODE *pSrc1, NODE *pSrc2) {
    NODE *pDst = NULL;          /* destination head ptr */
    NODE **ppDst = &pDst;       /* ptr to head or prev->next */
    for (;;) {
        if (pSrc2->data < pSrc1->data) {  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            if (pSrc2 == NULL) {
                *ppDst = pSrc1;
                break;
            }
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            if (pSrc1 == NULL) {
                *ppDst = pSrc2;
                break;
            }
        }
    }
    return pDst;
}

/* sort a list using array of pointers to list       */
NODE *MergeSort(NODE *pNode) {
#define NUMLISTS 32             /* number of lists */
    NODE *aList[NUMLISTS];      /* array of lists */
    /* aList[i] == NULL or ptr to list with 2^i nodes    */
    int i, n = 0;

    while (pNode != NULL) {
        NODE *pNext = pNode->next;
        pNode->next = NULL;
        for (i = 0; i < n && aList[i] != NULL; i++) {
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if (i == NUMLISTS)   /* don't go beyond end of array */
            i--;
        else
        if (i == n) /* extend array */
            n++;
        aList[i] = pNode;
        pNode = pNext;
    }
    for (i = 0; i < n; i++) {
        if (!pNode)
            pNode = aList[i];
        else if (aList[i])
            pNode = MergeLists(aList[i], pNode);
    }
    return pNode;
}

void QuickSortRec(NODE **pStart, NODE *pList, NODE *stop) {
    NODE *pivot, *left, *right;
    NODE **ppivot, **pleft, **pright;
    int data, nleft, nright;

    while (pList != stop && pList->next != stop) {
        data = pList->data;     // use the first node as pivot
        pivot = pList;
        ppivot = &pList->next;
        pleft = &left;
        pright = &right;
        nleft = nright = 0;

        while ((pList = pList->next) != stop) {
            if (data == pList->data) {
                *ppivot = pList;
                ppivot = &pList->next;
            } else
            if (data > pList->data) {
                nleft++;
                *pleft = pList;
                pleft = &pList->next;
            } else {
                nright++;
                *pright = pList;
                pright = &pList->next;
            }
        }
        *pleft = pivot;
        *pright = stop;
        *ppivot = right;
        if (nleft >= nright) {       // recurse on the smaller part
            if (nright > 1)
                QuickSortRec(ppivot, right, stop);
            pList = left;
            stop = pivot;
        } else {
            if (nleft > 1)
                QuickSortRec(pStart, left, pivot);
            pStart = ppivot;
            pList = right;
        }
    }
    *pStart = pList;
}

NODE *QuickSort(NODE *pList) {
    QuickSortRec(&pList, pList, NULL);
    return pList;
}

int NodeCmp(const void *a, const void *b) {
    NODE *aa = *(NODE * const *)a;
    NODE *bb = *(NODE * const *)b;
    return (aa->data > bb->data) - (aa->data < bb->data);
}

NODE *QuickSortA(NODE *pList) {
    NODE *pNode;
    NODE **pArray;
    int i, len;

    /* compute the length of the list */
    for (pNode = pList, len = 0; pNode; pNode = pNode->next)
        len++;
    if (len > 1) {
        /* allocate an array of NODE pointers */
        if ((pArray = malloc(len * sizeof(NODE *))) == NULL) {
            QuickSortRec(&pList, pList, NULL);
            return pList;
        }
        /* initialize the array from the list */
        for (pNode = pList, i = 0; pNode; pNode = pNode->next)
            pArray[i++] = pNode;
        qsort(pArray, len, sizeof(*pArray), NodeCmp);
        for (i = 0; i < len - 1; i++)
            pArray[i]->next = pArray[i + 1];
        pArray[i]->next = NULL;
        pList = pArray[0];
        free(pArray);
    }
    return pList;
}

int isSorted(NODE *pList) {
    if (pList) {
        int data = pList->data;
        while ((pList = pList->next) != NULL) {
            if (data > pList->data)
                return 0;
            data = pList->data;
        }
    }
    return 1;
}

void test(int count) {
    NODE *pMem1, *pMem2, *pMem3;
    NODE *pList1, *pList2, *pList3;
    int i;
    time_t t1, t2, t3;

    /* create linear lists of nodes with pseudo-random data */
    srand(clock());

    if (count == 0
    ||  (pMem1 = malloc(count * sizeof(NODE))) == NULL
    ||  (pMem2 = malloc(count * sizeof(NODE))) == NULL
    ||  (pMem3 = malloc(count * sizeof(NODE))) == NULL)
        return;

    for (i = 0; i < count; i++) {
        int data = rand();
        pMem1[i].data = data;
        pMem1[i].next = &pMem1[i + 1];
        pMem2[i].data = data;
        pMem2[i].next = &pMem2[i + 1];
        pMem3[i].data = data;
        pMem3[i].next = &pMem3[i + 1];
    }
    pMem1[count - 1].next = NULL;
    pMem2[count - 1].next = NULL;
    pMem3[count - 1].next = NULL;

    t1 = clock();
    pList1 = MergeSort(pMem1);
    t1 = clock() - t1;

    t2 = clock();
    pList2 = QuickSort(pMem2);
    t2 = clock() - t2;

    t3 = clock();
    pList3 = QuickSortA(pMem3);
    t3 = clock() - t3;

    printf("%10d", count);
    if (isSorted(pList1))
        printf(" %10.3fms", t1 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    if (isSorted(pList2))
        printf(" %10.3fms", t2 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    if (isSorted(pList3))
        printf(" %10.3fms", t3 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    printf("\n");

    free(pMem1);
    free(pMem2);
}

int main(int argc, char **argv) {
    int i;

    printf("        N      MergeSort    QuickSort   QuickSortA\n");
    if (argc > 1) {
        for (i = 1; i < argc; i++)
            test(strtol(argv[1], NULL, 0));
    } else {
        for (i = 10; i < 23; i++)
            test(1 << i);
    }
    return 0;
}

以下是列表的基准，其长度以几何方式增加，显示了 N log（N）次：

        N      MergeSort    QuickSort   QuickSortA
      1024      0.052ms      0.057ms      0.105ms
      2048      0.110ms      0.114ms      0.190ms
      4096      0.283ms      0.313ms      0.468ms
      8192      0.639ms      0.834ms      1.022ms
     16384      1.233ms      1.491ms      1.930ms
     32768      2.702ms      3.786ms      4.392ms
     65536      8.267ms     10.442ms     13.993ms
    131072     23.461ms     34.229ms     27.278ms
    262144     51.593ms     71.619ms     51.663ms
    524288    114.656ms    240.946ms    120.556ms
   1048576    284.717ms    535.906ms    279.828ms
   2097152    707.635ms   1465.617ms    636.149ms
   4194304   1778.418ms   3508.703ms   1424.820ms

在这些数据集上，

QuickSort()的速度大约是MergeSort()的一半，但在部分有序集和其他病理情况下的表现会更差，而MergeSort具有固定的时间复杂度不依赖于数据集并执行稳定的排序。对于我的系统上的大型数据集，QuickSortA()的性能要稍好于MergeSort()，但性能取决于qsort的实际实现，不一定使用快速排序算法。

MergeSort()不会分配任何额外的内存，并且会执行稳定的排序，这显然是对列表进行排序的赢家。

_{1）很好，毕竟不是那么简单，但是枢轴的选择太简单了}

使用Quicksort排序链接列表是否比Mergesort慢，因为链接列表中没有随机访问？

4 个答案: