Question

我需要在Fortran中制作一个dot产品。我可以使用Fortran的内部函数dot_product或者使用OpenBLAS中的ddot。问题是host_str=$(grep "HOST='$1'" $VESTA/conf/mysql.conf) eval $host_str if [ -z $HOST ] || [ -z $USER ] || [ -z $PASSWORD ]; then echo "Error: mysql config parsing failed" log_event "$E_PARSING" "$EVENT" exit $E_PARSING fi更慢。这是我的代码：

使用BLAS：

ddot

使用program VectorBLAS ! time VectorBlas.e = 0.30s implicit none double precision, dimension(3) :: b double precision :: result double precision, external :: ddot integer, parameter :: LargeInt_K = selected_int_kind (18) integer (kind=LargeInt_K) :: I DO I = 1, 10000000 b(:) = 3 result = ddot(3, b, 1, b, 1) END DO end program VectorBLAS

dot_product

这两个代码使用以下编译：

program VectorModule
! time VectorModule.e = 0.19s
implicit none
double precision, dimension (3)  :: b
double precision                 :: result
integer, parameter              :: LargeInt_K = selected_int_kind (18)
integer (kind=LargeInt_K)        :: I

DO I = 1, 10000000
  b(:) = 3
  result = dot_product(b, b)
END DO
end program VectorModule

我做错了什么？ BLAS不一定要更快？

Answer 1

虽然BLAS，尤其是优化版本，对于较大的阵列通常更快，但内置功能对于较小的尺寸更快。

从ddot的链接源代码中可以看出这一点，其中额外的工作花费在其他功能上（例如，不同的增量）。对于小阵列长度，这里完成的工作超过了优化的性能增益。

如果你的矢量（更多）更大，优化的版本应该更快。

以下是一个例子来说明这一点：

program test
  use, intrinsic :: ISO_Fortran_env, only: REAL64
  implicit none
  integer                   :: t1, t2, rate, ttot1, ttot2, i
  real(REAL64), allocatable :: a(:),b(:),c(:)
  real(REAL64), external    :: ddot

  allocate( a(100000), b(100000), c(100000) )
  call system_clock(count_rate=rate)

  ttot1 = 0 ; ttot2 = 0
  do i=1,1000
    call random_number(a)
    call random_number(b)

    call system_clock(t1)
    c = dot_product(a,b)
    call system_clock(t2)
    ttot1 = ttot1 + t2 - t1

    call system_clock(t1)
    c = ddot(100000,a,1,b,1)
    call system_clock(t2)
    ttot2 = ttot2 + t2 - t1
  enddo
  print *,'dot_product: ', real(ttot1)/real(rate) 
  print *,'BLAS, ddot:  ', real(ttot2)/real(rate) 
end program

这里的BLAS例程要快得多：

OMP_NUM_THREADS=1 ./a.out 
 dot_product:   0.145999998    
 BLAS, ddot:    0.100000001

OpenBLAS比内在函数dot_product慢

1 个答案: