使用StorageLevel来提高性能

时间:2018-02-01 15:47:15

标签: apache-spark caching

我正在尝试使用应用于我的数据帧的方法persist来提高性能。

但我完全失去了可用的多种组合。 我执行了这个:

a = [True, False]
for i in a:
    for j in a:
        for k in a:
            for l in a:
                for m in [1,2]:
                    print(StorageLevel(i,j,k,l,m))
Disk Memory OffHeap Deserialized 1x Replicated
Disk Memory OffHeap Deserialized 2x Replicated
Disk Memory OffHeap Serialized 1x Replicated
Disk Memory OffHeap Serialized 2x Replicated
Disk Memory Deserialized 1x Replicated
Disk Memory Deserialized 2x Replicated
Disk Memory Serialized 1x Replicated
Disk Memory Serialized 2x Replicated
Disk OffHeap Deserialized 1x Replicated
Disk OffHeap Deserialized 2x Replicated
Disk OffHeap Serialized 1x Replicated
Disk OffHeap Serialized 2x Replicated
Disk Deserialized 1x Replicated
Disk Deserialized 2x Replicated
Disk Serialized 1x Replicated
Disk Serialized 2x Replicated
Memory OffHeap Deserialized 1x Replicated
Memory OffHeap Deserialized 2x Replicated
Memory OffHeap Serialized 1x Replicated
Memory OffHeap Serialized 2x Replicated
Memory Deserialized 1x Replicated
Memory Deserialized 2x Replicated
Memory Serialized 1x Replicated
Memory Serialized 2x Replicated
OffHeap Deserialized 1x Replicated
OffHeap Deserialized 2x Replicated
OffHeap Serialized 1x Replicated
OffHeap Serialized 2x Replicated
Deserialized 1x Replicated
Deserialized 2x Replicated
Serialized 1x Replicated
Serialized 2x Replicated

每个选项的含义是什么?在这种情况下,我应该使用哪一个?

0 个答案:

没有答案
相关问题