我正在尝试在Xeon Phi KNC(具有61个内核和4T / C)和Xeon(具有2个Xeon E5-2660 v2插槽)上运行以下具有不同n大小的代码。
我得到的时间表如下表所示。但是,我试图理解为什么MIC的性能比运行Xeon处理器差。我在这里做错什么了,如何解决(如果可能)?
谢谢!
代码:
program prog
integer, allocatable :: arr1(:), arr2(:)
integer :: i, n, time_start, time_end
n=481
do while (n .le. 481000000)
allocate(arr1(n),arr2(n))
call system_clock(time_start)
!dir$ offload begin target(mic)
!$omp SIMD
do i=1,n
arr1(i) = arr1(i) + arr2(i)
end do
!dir$ end offload
call system_clock(time_end)
write (,) "n=",n," time=",time_end-time_start
deallocate(arr1,arr2)
n = n*10
end do
end program
至强皮结果:
n= 481 time= 8881
n= 4810 time= 75
n= 48100 time= 53
n= 481000 time= 261
n= 4810000 time= 1991
n= 48100000 time= 18912
n= 481000000 time= 188203
设置:
#!/bin/bash #SBATCH -N 1 #SBATCH -o out_122 #SBATCH --exclusive export MIC_KMP_AFFINITY=verbose,granularity=fine,scatter export MIC_OMP_NUM_THREADS=122 ./prog.exe
sbatch -p xphi -N 1 --exclusive run_par.sh
所有设置都在run_par.sh中,而xphi是设备的名称。
还值得一提的是,本机运行(在!$ omp SIMD之前添加!dir $卸载开始target(mic))会产生更好的结果。
n= 481 time= 0
n= 4810 time= 0
n= 48100 time= 6
n= 481000 time= 55
n= 4810000 time= 455
n= 48100000 time= 4342
n= 481000000 time= 43322
在本机运行中,rhe设置为:
#!/bin/bash #SBATCH -N 1 #SBATCH -o out_244_native #SBATCH --exclusive export SINK_LD_LIBRARY_PATH=...intel/compilers_and_libraries/linux/lib/mic:$SINK_LD_LIBRARY_PATH micnativeloadex ./prog.exe.MIC -e "KMP_AFFINITY=verbose,granularity=fine,scatter"
至强结果:
n= 481 time= 0
n= 4810 time= 0
n= 48100 time= 2
n= 481000 time= 19
n= 4810000 time= 93
n= 48100000 time= 706
n= 481000000 time= 7006
以下是我的Xeon机器上的lscpu命令的输出:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
Stepping: 4
CPU MHz: 1203.382
BogoMIPS: 4405.99
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
我的MIC规格是(/ proc / cpuinfo的尾部):
processor : 239
vendor_id : GenuineIntel
cpu family : 11
model : 1
model name : 0b/01
stepping : 3
cpu MHz : 1052.630
cache size : 512 KB
physical id : 0
siblings : 240
core id : 59
cpu cores : 60
apicid : 239
initial apicid : 239
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr htsyscall nx lm nopl lahf_lm
bogomips : 2112.44
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: