Question

我想在C中实现以下等式：

C[l,q,m] = A[m,q,k] * B[k,l]

重复索引k的总和。

我用三种方式实现了这个：

使用循环实现朴素
使用BLAS例程DGEMV（矩阵向量乘法）
使用BLAS例程DGEMM（矩阵 - 矩阵乘法）

这是最小的无法工作代码：

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <cblas.h>

void main()
{

    const size_t n = 3;
    const size_t n2 = n*n;
    const size_t n3 = n*n*n;

    /* Fill rank 3 tensor with random numbers */
    double a[n3];
    for (size_t i = 0; i < n3; i++) {
        a[i] = (double) rand() / RAND_MAX;
    }

    /* Fill matrix with random numbers */
    double b[n2];
    for (size_t i = 0; i < n2; i++) {
        b[i] = (double) rand() / RAND_MAX;
    }

    /* All loops */
    double c_exact[n3];
    memset(c_exact, 0, n3 * sizeof(double));
    for (size_t l = 0; l < n; l++) {
        for (size_t q = 0; q < n; q++) {
            for (size_t m = 0; m < n; m++) {
                for (size_t k = 0; k < n; k++) {
                    c_exact[l*n2+q*n+m] += a[m*n2+q*n+k] * b[k*n+l];
                }
            }
        }
    }

    /* Matrix-vector */
    double c_mv[n3];
    memset(c_mv, 0, n3 * sizeof(double));
    for (size_t m = 0; m < n; m++) {
        for (size_t l = 0; l < n; l++) {
            cblas_dgemv(
                    CblasRowMajor, CblasNoTrans, n, n, 1.0, &a[m*n2],
                    n, &b[l], n, 0.0, &c_mv[l*n2+m], n);
        }
    }

    /* Matrix-matrix */
    double c_mm[n3];
    memset(c_mm, 0, n3 * sizeof(double));
    for (size_t m = 0; m < n; m++) {
        cblas_dgemm(
                CblasRowMajor, CblasTrans, CblasTrans, n, n, n, 1.0, b, n,
                &a[m*n2], n, 0.0, &c_mm[m], n2);
    }

    /* Compute difference */
    double diff_mv = 0.0;
    double diff_mm = 0.0;
    for (size_t idx = 0; idx < n3; idx++) {
        diff_mv += c_mv[idx] - c_exact[idx];
        diff_mm += c_mm[idx] - c_exact[idx];
    }
    printf("Difference matrix-vector: %e\n", diff_mv);
    printf("Difference matrix-matrix: %e\n", diff_mm);
}

这就是输出：

Difference matrix-vector: 0.000000e+00
Difference matrix-matrix: -1.188678e+01

即。 DGEMV实现是正确的，DGEMM没有 - 我真的不明白这一点。我转换了乘法（矩阵 - 矩阵乘法是非交换的）并且转换两者以获得正确的顺序C [l，q，m]而不是C [q，l，m]，但我也尝试了它而没有切换/转置它不起作用。

有人可以帮忙吗？谢谢。

编辑：我想了一下，感觉我正在尝试做一些DGEMM不支持的事情？即我尝试将子矩阵插入C [：，：，m]，这意味着前导索引和尾随索引在内存中都不连续。 DGEMM允许我设置参数LDC，在这种情况下需要n ^ 2，但它不知道第二个索引也是非连续的n步（并且没有参数告诉它？）。那么为什么DGEMM不支持尾随维度的步幅的第二个参数呢？

DGEMM和DGEMV给出不同的结果

0 个答案: