禁止非法访问内存但有足够的内存

时间:2018-09-03 06:09:37

标签: cuda

我尝试运行以下代码,并遇到内存访问错误。我试图运行cuda-memcheck,它返回以下信息:

对我来说很奇怪,因为相同的代码可以在具有不同GPU卡的另一台计算机上很好地运行。我尝试使用cuda-memcheck,它返回以下信息:

========== CUDA-MEMCHECK C:/Users/s161901/Desktop/Liver_Bio_recon_study/IVFD_GPU_only/codes/kernel.cu:2010代码= 4(cudaErrorLaunchFailure)“ cudaMemcpy(Iout,d_Iout,sizeof(float)* voxelNumberHost,cudaMemcpyToHost的CUDA错误 =========由于对cudaThreadSynchronize的CUDA API调用上出现“未指定的启动失败”,导致程序命中cudaErrorLaunchFailure(错误4)。 =========保存的主机回溯到错误时的驱动程序入口点 =========主机框架:C:\ WINDOWS \ system32 \ nvcuda.dll(cuDevicePrimaryCtxGetState + 0x2d242e)[0x2e006b] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ cudart64_70.dll(cudaThreadSynchronize + 0xf5)[0x1bba5] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x130ef] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0xc17b] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x1bedc] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x1bd2e] =========主机框架:C:\ WINDOWS \ System32 \ KERNEL32.DLL(BaseThreadInitThunk + 0x14)[0x13034]

==========主机框架:C:\ WINDOWS \ SYSTEM32 \ ntdll.dll(RtlUserThreadStart + 0x21)[0x71551]

==========由于对cudaMemcpy的CUDA API调用上的“未指定的启动失败”,程序命中了cudaErrorLaunchFailure(错误4)。 =========保存的主机回溯到错误时的驱动程序入口点 =========主机框架:C:\ WINDOWS \ system32 \ nvcuda.dll(cuDevicePrimaryCtxGetState + 0x2d242e)[0x2e006b] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ cudart64_70.dll(cudaMemcpy + 0x12f)[0x2711f] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x13113] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0xc17b] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x1bedc] =========主机框架:F:\ Liver_Bio_recon_study \ IVFD_GPU_only \ IVFD_GPU_0.exe [0x1bd2e] =========主机框架:C:\ WINDOWS \ System32 \ KERNEL32.DLL(BaseThreadInitThunk + 0x14)[0x13034]

==========主机框架:C:\ WINDOWS \ SYSTEM32 \ ntdll.dll(RtlUserThreadStart + 0x21)[0x71551]

==========错误摘要:2个错误

我的cuda内核是:

__global__ void backwardProj(float *dest, int NPROJ, float *d_prj3d, int nx, int nz, float vx, float vz, float *d_sine, float *d_cosine, float PIXSIZE_X, float PIXSIZE_Z, int NI_X, int NI_Z, float L1, float L2)
{
const int tid = (blockIdx.y*32768 + blockIdx.x)*blockDim.x + threadIdx.x;


if (tid >= nx*nx*nz)
return;


    float result = 0.0F;
    for(int iproj = 0; iproj < NPROJ; iproj++)
    {

        float sinTheta = d_sine[iproj];
        float cosTheta = d_cosine[iproj];
        //      setup rotation angle

        int ix = tid % nx;
        int iy = ((tid-ix)/nx) % nx;
        int iz = tid/(nx*nx);

        float xptemp = (ix - nx/2 + 0.5F) * vx;
        float yptemp = (iy - nx/2 + 0.5F) * vx;

        float xp = xptemp * cosTheta + yptemp * sinTheta;
        float yp = -xptemp * sinTheta + yptemp * cosTheta;
        float zp = (iz - nz/2 + 0.5F) * vz;
        //      coordinate of a point in the phantom in rotated coordinate

        float xs = -L1;
        float ys = 0.0;
        float zs = 0.0;
        //      coordinate of source in rotated coordinate

        float x = -(ys + (yp - ys)*(L2 - xs)/(xp - xs))/PIXSIZE_X + NI_X/2;
        float z = (zs + (zp - zs)*(L2 - xs)/(xp - xs))/PIXSIZE_Z + NI_Z/2;
        int xi = floor(x-0.50);
        int zi = floor(z-0.50);
        //      coordinate on the imager in unit of pixsize

        float factor1 = sqrt((xp-xs)*(xp-xs)+(yp-ys)*(yp-ys)+(zp-zs)*(zp-zs))/abs(xp-xs);
        float factor2 = (L1+L2)*(L1+L2)/(xp - xs)/(xp-xs);

        float v00 = (x>=0 && x<=NI_X && z>=0 && zi<=NI_Z) * 
            d_prj3d[ind3to1(xi+(xi<0), zi+(zi<0), iproj, NI_X, NI_Z, NPROJ) ];
        float v10 = (x>=0 && x<=NI_X && z>=0 && zi<=NI_Z) * 
            d_prj3d[ (xi+1-(xi+1>=NI_X)) +  (zi+(zi<0)) * NI_X +  iproj * NI_X * NI_Z];
        float v01 = (x>=0 && x<=NI_X && z>=0 && zi<=NI_Z) * 
            d_prj3d[ xi+(xi<0) + (zi+1-(zi+1>=NI_Z)) * NI_X + iproj * NI_X * NI_Z ];
        float v11 = (x>=0 && x<=NI_X && z>=0 && zi<=NI_Z) * 
            d_prj3d[ xi+1-(xi+1>=NI_X) + (zi+1-(zi+1>=NI_Z))*NI_X +  iproj * NI_X * NI_Z ];

        //      obtain values at four nearest neighbors

        x -= xi;
        z -= zi;
        float value = v00*(1-x)*(1-z) + v10*x*(1-z) + v01*(1-x)*z + v11*x*z;
        //      biliear interpolation

        result += factor1*factor2*value;

    }

    dest[tid] =(result*vx*vx*vz/PIXSIZE_X/PIXSIZE_Z)/vx;    

}

我的主持人呼叫是:

N = 256*256*217;
nblocks.x = 32768;
NTHREAD_PER_BLOCK=512;
nblocks.y =  ((1 + (N - 1)/NTHREAD_PER_BLOCK) - 1) / NBLOCKX + 1;       

    backwardProj<<<nblocks, NTHREAD_PER_BLOCK>>>(sk, nview, d_diff3d, nx, nz, VOXSIZE_X, VOXSIZE_Z, d_sine, d_cosin,  PIXSIZE_X,  PIXSIZE_Z, NI_X, NI_Z,  L1,  L2);
    cudaThreadSynchronize();
    cudaError_t error = cudaGetLastError();
    if(error!=cudaSuccess)
     {
    fprintf(stderr,"ERROR: %s\n", cudaGetErrorString(error) );
    exit(-1);
     }

1 个答案:

答案 0 :(得分:0)

非法内存访问与内存不足无关。这意味着您的代码正在尝试访问不应访问的内存。这意味着您的代码中存在错误。

不幸的是,您只显示了一些代码,而不是一个独立的示例,因此对于您的代码,没有太多人可以说。我最好的猜测是,很可能您的索引计算有问题……