为什么Collections.sort()比Arrays.sort()慢得多?

时间:2018-10-09 05:49:15

标签: java arrays list sorting timing

我试图针对Collection.sort()Arrays.sort()进行测试。在测试中,我创建了一个int长度为1e5的数组100次,其中包含从1到1e5的随机数。我还创建了一个类型为Integer的列表,该列表在与数组相同的位置包含相同的值。然后,我使用Arrays.sort()对数组进行了排序,并使用Collections.sort()对列表进行了排序。


更新:正如@Holger指出的那样,我的代码存在错误。现在更正的代码是:

import java.util.* ;


class TestClass {
    public static void main(String args[] ) throws Exception {
        double ratSum = 0 ;
        for(int j=0;j<100;j++)
        {
        int[] A = new int[(int)1e5] ;
        List<Integer> L = new ArrayList<Integer>() ;
        for(int i=0;i<A.length;i++)
        {
            int no = (int)(Math.random()*(int)1e5) ;
            A[i] = no ;
            L.add(A[i]) ;
        }

        long startTime = System.nanoTime() ;
        Arrays.sort(A) ;
        long endTime = System.nanoTime() ;
        Collections.sort(L) ;
        long endTime2 = System.nanoTime() ;
        long t1 = (endTime-startTime), t2 = (endTime2-endTime) ;
        ratSum+=(double)t2/t1 ;
        System.out.println("Arrays.sort took :"+t1+" Collections.sort took :"+t2+" ratio :"+((double)t2/t1)) ;
    }
    System.out.println("Average ratio :"+(ratSum/100)) ;
    }
}

输出为:

Arrays.sort took :24106021 Collections.sort took :92353602 ratio :3.8311425182944956
Arrays.sort took :8672831 Collections.sort took :50936497 ratio :5.873110752417521
Arrays.sort took :8561227 Collections.sort took :25611480 ratio :2.991566512603859
Arrays.sort took :7123928 Collections.sort took :17368785 ratio :2.4380910362934607
Arrays.sort took :6280488 Collections.sort took :16929218 ratio :2.6955258890710403
Arrays.sort took :6248227 Collections.sort took :16844915 ratio :2.695951187432851
Arrays.sort took :6220942 Collections.sort took :16979669 ratio :2.7294369566538315
Arrays.sort took :6213841 Collections.sort took :17439817 ratio :2.8066081832476883
Arrays.sort took :6286385 Collections.sort took :19963612 ratio :3.175690321225951
Arrays.sort took :6209668 Collections.sort took :17008307 ratio :2.7390042430609816
Arrays.sort took :6286623 Collections.sort took :17007163 ratio :2.705293923303497
Arrays.sort took :6256505 Collections.sort took :16911950 ratio :2.703098614961548
Arrays.sort took :6225031 Collections.sort took :16914494 ratio :2.7171742598550916
Arrays.sort took :6233918 Collections.sort took :17005995 ratio :2.72797861633727
Arrays.sort took :6210554 Collections.sort took :17606028 ratio :2.834856278522013
Arrays.sort took :6239384 Collections.sort took :20342378 ratio :3.260318326296314
Arrays.sort took :6207695 Collections.sort took :16519089 ratio :2.6610664666997974
Arrays.sort took :6227147 Collections.sort took :16605884 ratio :2.666692146499834
Arrays.sort took :6225187 Collections.sort took :16687597 ratio :2.680657946500242
Arrays.sort took :6152338 Collections.sort took :16475373 ratio :2.6779043999208105
Arrays.sort took :6184746 Collections.sort took :16511024 ratio :2.6696365541931715
Arrays.sort took :6130221 Collections.sort took :16578032 ratio :2.7043122915144493
Arrays.sort took :6271927 Collections.sort took :16507152 ratio :2.631910734930429
Arrays.sort took :6232482 Collections.sort took :16562166 ratio :2.657394919070765
Arrays.sort took :6218992 Collections.sort took :16552468 ratio :2.661599821964717
Arrays.sort took :6230427 Collections.sort took :21954967 ratio :3.52383022865046
Arrays.sort took :8204666 Collections.sort took :16607560 ratio :2.024160398485447
Arrays.sort took :6272619 Collections.sort took :22061291 ratio :3.5170781136236715
Arrays.sort took :8618253 Collections.sort took :19979549 ratio :2.3182829513127543
Arrays.sort took :6198538 Collections.sort took :17002645 ratio :2.743008915973412
Arrays.sort took :6265018 Collections.sort took :17079646 ratio :2.7261926462142645
Arrays.sort took :6302335 Collections.sort took :17040082 ratio :2.7037728080148073
Arrays.sort took :6293948 Collections.sort took :17133482 ratio :2.722215372608735
Arrays.sort took :6272364 Collections.sort took :17099717 ratio :2.7261997231028046
Arrays.sort took :6219540 Collections.sort took :17026849 ratio :2.737637992520347
Arrays.sort took :6231000 Collections.sort took :17149439 ratio :2.7522771625742255
Arrays.sort took :6309215 Collections.sort took :17118779 ratio :2.713297771592821
Arrays.sort took :6200511 Collections.sort took :17123517 ratio :2.7616299688848227
Arrays.sort took :6263169 Collections.sort took :16995685 ratio :2.7135919532109063
Arrays.sort took :6212243 Collections.sort took :17101848 ratio :2.7529264389689843
Arrays.sort took :6247580 Collections.sort took :17089850 ratio :2.735435160494143
Arrays.sort took :6283626 Collections.sort took :17088109 ratio :2.7194662763188004
Arrays.sort took :6312678 Collections.sort took :17055856 ratio :2.7018415955954036
Arrays.sort took :6222695 Collections.sort took :17071263 ratio :2.7433873908330715
Arrays.sort took :6300990 Collections.sort took :17016171 ratio :2.7005551508572463
Arrays.sort took :6262923 Collections.sort took :17084477 ratio :2.727875945465081
Arrays.sort took :6256482 Collections.sort took :17062232 ratio :2.7271287602202006
Arrays.sort took :6259643 Collections.sort took :17036036 ratio :2.721566709155778
Arrays.sort took :6248649 Collections.sort took :16944960 ratio :2.711779778316881
Arrays.sort took :6264515 Collections.sort took :16986876 ratio :2.7116027338109974
Arrays.sort took :6241864 Collections.sort took :17367903 ratio :2.782486609769133
Arrays.sort took :6297429 Collections.sort took :17080086 ratio :2.7122316107097038
Arrays.sort took :6184084 Collections.sort took :17584862 ratio :2.843567778186713
Arrays.sort took :6315776 Collections.sort took :22279278 ratio :3.5275598754610678
Arrays.sort took :6253047 Collections.sort took :17091694 ratio :2.7333384828228544
Arrays.sort took :6291188 Collections.sort took :17147694 ratio :2.725668665441249
Arrays.sort took :6327348 Collections.sort took :17034007 ratio :2.6921242517402235
Arrays.sort took :6284904 Collections.sort took :17049315 ratio :2.712740719667317
Arrays.sort took :6190436 Collections.sort took :17143853 ratio :2.7694096183209065
Arrays.sort took :6301712 Collections.sort took :17070237 ratio :2.7088253160411013
Arrays.sort took :6208193 Collections.sort took :17060372 ratio :2.74804149935416
Arrays.sort took :6247700 Collections.sort took :16961962 ratio :2.7149130079869392
Arrays.sort took :6344996 Collections.sort took :17084627 ratio :2.6926143058246215
Arrays.sort took :6214232 Collections.sort took :17150324 ratio :2.759846108095095
Arrays.sort took :6224359 Collections.sort took :17081254 ratio :2.744259127727048
Arrays.sort took :6256722 Collections.sort took :17005451 ratio :2.7179489515436357
Arrays.sort took :6286439 Collections.sort took :17061112 ratio :2.713954911516679
Arrays.sort took :6250634 Collections.sort took :17091313 ratio :2.7343327092899696
Arrays.sort took :6252900 Collections.sort took :17041659 ratio :2.7254008540037424
Arrays.sort took :6222192 Collections.sort took :17125062 ratio :2.75225547524088
Arrays.sort took :6227037 Collections.sort took :17013314 ratio :2.7321684454420296
Arrays.sort took :6223609 Collections.sort took :17086112 ratio :2.745370411283871
Arrays.sort took :6280777 Collections.sort took :17091821 ratio :2.7212908530266238
Arrays.sort took :6254551 Collections.sort took :17148242 ratio :2.741722307484582
Arrays.sort took :6250927 Collections.sort took :17053331 ratio :2.7281283240069834
Arrays.sort took :6270616 Collections.sort took :17067948 ratio :2.721893351466586
Arrays.sort took :6223093 Collections.sort took :17034584 ratio :2.737317922132933
Arrays.sort took :6286002 Collections.sort took :17128280 ratio :2.7248289135129133
Arrays.sort took :6239485 Collections.sort took :17032062 ratio :2.7297224049741287
Arrays.sort took :6191290 Collections.sort took :17017219 ratio :2.748574045150526
Arrays.sort took :6134110 Collections.sort took :17069485 ratio :2.782715830006309
Arrays.sort took :6207363 Collections.sort took :17052862 ratio :2.747199092432648
Arrays.sort took :6238702 Collections.sort took :17056945 ratio :2.734053493819708
Arrays.sort took :6185356 Collections.sort took :17006088 ratio :2.749411351585907
Arrays.sort took :6309226 Collections.sort took :17056503 ratio :2.703422416632405
Arrays.sort took :6256706 Collections.sort took :17082903 ratio :2.7303349398229675
Arrays.sort took :6194988 Collections.sort took :17069426 ratio :2.7553606237816766
Arrays.sort took :6184266 Collections.sort took :17054641 ratio :2.757746998592881
Arrays.sort took :6271022 Collections.sort took :17086036 ratio :2.724601508334686
Arrays.sort took :6246482 Collections.sort took :17077804 ratio :2.733987546910405
Arrays.sort took :6194985 Collections.sort took :17119911 ratio :2.763511291794895
Arrays.sort took :6319199 Collections.sort took :17444587 ratio :2.760569337980969
Arrays.sort took :6262827 Collections.sort took :17065589 ratio :2.7249018693954024
Arrays.sort took :6301245 Collections.sort took :17195611 ratio :2.728922776371971
Arrays.sort took :6214333 Collections.sort took :17024645 ratio :2.739577199998777
Arrays.sort took :6213116 Collections.sort took :17382033 ratio :2.7976353572024086
Arrays.sort took :6286394 Collections.sort took :17124874 ratio :2.7241171965995132
Arrays.sort took :6166308 Collections.sort took :16998293 ratio :2.756640278104824
Arrays.sort took :6247395 Collections.sort took :16957056 ratio :2.7142602636779007
Arrays.sort took :6245054 Collections.sort took :16994147 ratio :2.72121698227109
Average ratio :2.792654880602193

此外,我在本地运行了1000次代码(而不是100次),平均比例为::3.0616 因此,该比率仍然很大,因此值得讨论。

问题: 为什么Collections.sort()花费Arrays.sort()大约三倍的时间来对相同的值进行排序? 是因为现在我们不比较基元吗?为什么要花更多时间?

5 个答案:

答案 0 :(得分:75)

因此,这里有两种方法完全不同的方法:

Arrays.sort(int[])使用双轴快速排序算法。

Collections.sort(List<T>)调用list.sort(null),而后者依次调用Arrays.sort(T[])。这使用了Timsort算法。

因此,让我们比较Arrays.sort(int[])Arrays.sort(T[])

  1. T[]是一个盒装数组,因此有一个额外的间接级别:对于每个调用,您都必须解开Integer。这肯定会导致开销。另一方面,int[]是一个原始数组,因此所有元素都可以“立即”使用。
  2. TimSort是经典mergesort算法的变体。它比mergesort快,但仍然比quicksort慢,因为
    • quicksort对随机数据的数据移动较少
    • 快速排序需要O(log(n))的额外空间,而TimSort则需要O(n)来提供稳定性,这也会导致开销。

答案 1 :(得分:12)

这里有两个问题:

问题1:

在幕后,Collections.sort的工作方式是将集合复制到数组,对数组进行排序,然后将数组复制回集合。

Arrays.sort只是将数组排序到位。

现在,对于足够大的数组/集合,排序(O(NlogN)的开销将占复制(O(N))的开销。对于小型阵列/集合,复制变得很重要。

(此行为可能取决于集合类型。对于ArrayListCollections.sort实现可能能够在不复制数据的情况下对后备数组进行排序。我需要检查源代码。< strong> UPDATE -已针对Java 8及更高版本的ArrayList进行就地排序。)

问题2:

您正在比较对int[]List<Integer>进行排序。

这是苹果和橘子。因为:

  1. 使用关系运算符比较两个int值比使用Integer比较两个compareTo(Integer)值要快。
  2. Arrays.sort(int[])使用与Arrays.sort(Object[])使用的算法不同(更快)的算法

如果您想进行更公平的比较,请将Collections.sort上的ArrayList<Integer>Arrays.sort(Object[])上的Integer[]进行比较。

答案 2 :(得分:1)

Collection.sort()使用合并排序算法,而Arrays.sort()使用快速排序。 快速排序在合并排序方面有主要缺点,在涉及非原始排序时不稳定。 因此,根据需要,我们将使用Arrays.sort()或Collection.sort()来比较对象或基元。

答案 3 :(得分:1)

如果看到Collections.sort()oracle doc here则显示为

  

此实现将指定的列表转储到数组中,对数组进行排序,然后遍历列表,从数组中的相应位置重置每个元素

这意味着它正在执行数组排序和附加迭代,这意味着Collections.sort()比arrays.sort慢

  1. 将指定的列表转储到数组中
  2. 排序数组〜arrays.sort
  3. 遍历列表,从数组中的相应位置重置每个元素

答案 4 :(得分:1)

在此过程中没有提到的一件事是“指针追逐”,它与“拆箱”部分有关。 对于这么小的数组,无论您使用timsort还是quicksort都不会产生显着差异(对于具有当前CPU速度的原始数组,这很可能不会杀死您的速度)。

在您的示例中,装箱不是在初始化之外发生的,但最大的区别发生在读取数据的地方。

因为int是原始数据,所以int []只是包含数据本身的连续内存,Integer []是包含对单个数据对象和Integer的引用(即指针)的连续内存对象本身可以散布在整个内存中。

因此,对于int []上的排序操作,CPU将获取一块内存并可以直接对其进行操作。但是对于Integer [],CPU必须追逐每个对象的指针并从内存中获取指针,然后才能对其进行比较,然后对作为数组的内存块进行操作并重新排列。这称为“指针追逐”。

Integer []对每段数据都需要更多的操作,例如读取值,在基地址中添加标头长度并从中获取值(CPU很好地对这些指令进行了流水线处理,掩盖了它的大部分影响),正是内存延迟使您丧命。从随机内存位置获取每个单独的Integer对象几乎可以实现所有不同。

通常这没什么大不了的,因为通常您会在一个紧密的循环中初始化少量的Integer [],并且在后台执行的操作并不多,因此Integer对象很可能在内存中非常接近可以将其提取到缓存中并从那里进行快速访问,但是对于在繁忙的应用程序中创建和修改的巨大数组和列表,这可能会产生很大的变化,并且可能会带来意外的延迟峰值。如果需要可靠的低延迟,您将希望避免这种情况。但是,对于大量的应用程序,如果排序需要花费几毫秒的时间,那么没人会注意到。

[编辑]

正如您在评论中要求的那样,下面的代码表明这与timsort vs quicksort无关:

import java.util.Arrays;
import java.util.Random;

public class Pointerchasing1 {

    public static void main(String[] args) {

        //use the exact same algorithm implementation (insertionSort), to show that slowness is not caused by timsort vs quicksort.
        //expect that the object-version is slower.

        final int[] direct = new int[1024]; 
        final Integer[] refs = new Integer[direct.length];

        final Random rnd = new Random(0);
        for (int t = 0; t < 1000; ++t) {
            Arrays.setAll(direct, index -> rnd.nextInt());
            Arrays.setAll(refs, index -> direct[index]); // boxing happens here

            //measure direct:
            long t1 = System.nanoTime();
            insertionSortPrimitive(direct);
            long e1 = System.nanoTime()-t1;
            //measure refs:         
            long t2 = System.nanoTime();
            insertionSortObjects(refs);
            long e2 = System.nanoTime()-t2;

            // use results, so compiler can't eliminate the loops
            System.out.println(Arrays.toString(direct));
            System.out.println(Arrays.toString(refs));
            System.out.println("-");            
            System.out.println(e1);
            System.out.println(e2);
            System.out.println("--");           
        }
    }

    private static void insertionSortPrimitive(final int[] arr) {
        int i, key, j;
        for (i = 1; i < arr.length; i++) {
            key = arr[i];
            j = i - 1;
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j = j - 1;
            }
            arr[j + 1] = key;
        }
    }

    private static void insertionSortObjects(final Integer[] arr) {
        int i, key, j;
        for (i = 1; i < arr.length; i++) {
            key = arr[i];
            j = i - 1;
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j = j - 1;
            }
            arr[j + 1] = key;
        }
    }

}

此“测试”使未装箱的罪魁祸首。

[EDIT2]

现在,此测试旨在显示“拆箱”不是问题。 拆箱只是将对象标头的几个字节添加到地址中(乱序执行和流水线操作使成本几乎消失了)并从该位置获取值。 在此测试中,我使用两个基本数组,一个用于引用,一个用于值。因此,每次访问都是间接的。这非常类似于拆箱,只是没有为对象头添加额外的几个字节。 主要区别在于“间接”版本不需要为堆上的每个值追逐指针,它可以加载数组,也可以将refs-array中的索引加载到values-array中。

如果指针追逐起到了重要作用,而不是取消装箱,那么间接版本应该比执行取消装箱的对象版本更快。

import java.util.Arrays;
import java.util.Random;

public class Pointerchasing2 {

    public static void main(String[] args) {

        // use indirect access (like unboxing, but just chasing a single array pointer) vs. Integer objects (chasing every object's pointer).
        // expect that the object-version is still slower.

        final int[] values = new int[1024];
        final int[] refs = new int[1024];
        final Integer[] objects = new Integer[values.length];

        final Random rnd = new Random(0);
        for (int t = 0; t < 1000; ++t) {
            Arrays.setAll(values, index -> rnd.nextInt());
            Arrays.setAll(refs, index -> index);
            Arrays.setAll(objects, index -> values[index]); // boxing happens here

            // measure indirect:
            long t1 = System.nanoTime();
            insertionSortPrimitiveIndirect(refs, values);
            long e1 = System.nanoTime() - t1;
            // measure objects:
            long t2 = System.nanoTime();
            insertionSortObjects(objects);
            long e2 = System.nanoTime() - t2;

            // use results, so compiler can't eliminate the loops
            System.out.println(Arrays.toString(indirectResult(refs, values)));
            System.out.println(Arrays.toString(objects));
            System.out.println("-");
            System.out.println(e1);
            System.out.println(e2);
            System.out.println("--");
        }
    }

    private static void insertionSortPrimitiveIndirect(final int[] refs, int[] values) {
        int i, keyIndex, j;
        for (i = 1; i < refs.length; i++) {
            keyIndex = refs[i];
            j = i - 1;
            while (j >= 0 && values[refs[j]] > values[keyIndex]) {
                refs[j + 1] = refs[j];
                j = j - 1;
            }
            refs[j + 1] = keyIndex;
        }
    }

    private static void insertionSortObjects(final Integer[] arr) {
        int i, key, j;
        for (i = 1; i < arr.length; i++) {
            key = arr[i];
            j = i - 1;
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j = j - 1;
            }
            arr[j + 1] = key;
        }
    }

    private static int[] indirectResult(final int[] refs, int[] values) {
        final int[] result = new int[1024];
        Arrays.setAll(result, index -> values[refs[index]]);
        return result;
    }

}

结果: 在这两个测试中,“原始”和“间接”版本比访问堆上的对象要快。可以预料,取消装箱不会降低速度,而是通过指针追逐来降低内存延迟。

另请参阅有关Valhalla项目的视频: (“ JVM中的值类型和泛型专业化有望为我们提供更好的JIT代码,数据局部性并消除指针追逐的专制。”) https://vimeo.com/289667280