Question

我想在大（稀疏）矩阵上执行奇异值分解。为了选择最好（最准确）的库，我尝试使用不同的Java和Python库复制提供here的SVD示例。奇怪的是，每个图书馆都会得到不同的结果。

这是原始的示例矩阵，它是分解的（U S和VT）矩阵：

A =2.0  0.0 8.0 6.0 0.0
   1.0 6.0 0.0 1.0 7.0
   5.0 0.0 7.0 4.0 0.0
   7.0 0.0 8.0 5.0 0.0 
   0.0 10.0 0.0 0.0 7.0

U =-0.54 0.07 0.82 -0.11 0.12
   -0.10 -0.59 -0.11 -0.79 -0.06
   -0.53 0.06 -0.21 0.12 -0.81
   -0.65 0.07 -0.51 0.06 0.56
   -0.06 -0.80 0.09 0.59 0.04

VT =-0.46 0.02 -0.87 -0.00 0.17
    -0.07 -0.76 0.06 0.60 0.23
    -0.74 0.10 0.28 0.22 -0.56
    -0.48 0.03 0.40 -0.33 0.70
    -0.07 -0.64 -0.04 -0.69 -0.32

S (with the top three singular values) = 
   17.92 0 0
   0 15.17 0
   0 0 3.56

我尝试使用以下Java和Python库： Java：PColt，Jama Python：NumPy

以下是每个人的结果：

Jama:
U = 0.5423  -0.065  -0.8216 0.1057  -0.1245 
    0.1018  0.5935  0.1126  0.7881  0.0603  
    0.525   -0.0594 0.213   -0.1157 0.8137  
    0.6449  -0.0704 0.5087  -0.0599 -0.5628 
    0.0645  0.7969  -0.09   -0.5922 -0.0441 

VT =0.4646  -0.0215 0.8685  8.0E-4  -0.1713 
    0.0701  0.76    -0.0631 -0.6013 -0.2278 
    0.7351  -0.0988 -0.284  -0.2235 0.565   
    0.4844  -0.0254 -0.3989 0.3327  -0.7035 
    0.065   0.6415  0.0443  0.6912  0.3233  

S = 17.9184 0.0 0.0 0.0 0.0 
    0.0 15.1714 0.0 0.0 0.0 
    0.0 0.0 3.564   0.0 0.0 
    0.0 0.0 0.0 1.9842  0.0 
    0.0 0.0 0.0 0.0 0.3496  

PColt:
U = -0.542255   0.0649957  0.821617  0.105747  -0.124490 
    -0.101812  -0.593461 -0.112552  0.788123   0.0602700
    -0.524953   0.0593817 -0.212969 -0.115742   0.813724 
    -0.644870   0.0704063 -0.508744 -0.0599027 -0.562829 
    -0.0644952 -0.796930  0.0900097 -0.592195  -0.0441263

VT =-0.464617   0.0215065 -0.868509   0.000799554 -0.171349
    -0.0700860 -0.759988  0.0630715 -0.601346   -0.227841
    -0.735094   0.0987971  0.284009  -0.223485    0.565040
    -0.484392   0.0254474  0.398866   0.332684   -0.703523
    -0.0649698 -0.641520 -0.0442743  0.691201    0.323284

S = 
(00)    17.91837085874625
(11)    15.17137188041607
(22)    3.5640020352605677
(33)    1.9842281528992616
(44)    0.3495556671751232


Numpy

U = -0.54225536  0.06499573  0.82161708  0.10574661 -0.12448979
    -0.10181247 -0.59346055 -0.11255162  0.78812338  0.06026999
    -0.52495325  0.05938171 -0.21296861 -0.11574223  0.81372354
    -0.64487038  0.07040626 -0.50874368 -0.05990271 -0.56282918
    -0.06449519 -0.79692967  0.09000966 -0.59219473 -0.04412631

VT =-4.64617e-01   2.15065e-02  -8.68508e-01    7.99553e-04  -1.71349e-01
    -7.00859e-02  -7.59987e-01   6.30714e-02   -6.01345e-01  -2.27841e-01
    -7.35093e-01   9.87971e-02   2.84008e-01   -2.23484e-01   5.65040e-01
    -4.84391e-01   2.54473e-02   3.98865e-01    3.32683e-01  -7.03523e-01
    -6.49698e-02  -6.41519e-01  -4.42743e-02    6.91201e-01   3.23283e-01

S = 17.91837086  15.17137188   3.56400204   1.98422815   0.34955567

可以注意到，Jama分解矩阵（u＆amp; VT）中每个元素的符号与原始例子中的符号相反。有趣的是，对于PColt和Numpy，只有最后两列中元素的符号被反转。倒置标志背后有什么具体原因吗？有人遇到过类似的差异吗？

以下是我使用的代码：爪哇

import java.text.DecimalFormat;
import cern.colt.matrix.tdouble.DoubleMatrix2D;
import cern.colt.matrix.tdouble.algo.DenseDoubleAlgebra;
import cern.colt.matrix.tdouble.algo.decomposition.DenseDoubleSingularValueDecomposition;
import cern.colt.matrix.tdouble.impl.DenseDoubleMatrix2D;
import Jama.Matrix;
import Jama.SingularValueDecomposition;
public class SVD_Test implements java.io.Serializable{

    public static void main(String[] args)
    {   

        double[][] data2 = new double[][]
                {{ 2.0, 0.0, 8.0, 6.0, 0.0},
                { 1.0, 6.0, 0.0, 1.0, 7.0},
                { 5.0, 0.0, 7.0, 4.0, 0.0},
                { 7.0, 0.0, 8.0, 5.0, 0.0},
                { 0.0, 10.0, 0.0, 0.0, 7.0}};

        DoubleMatrix2D pColt_matrix = new DenseDoubleMatrix2D(5,5);
        pColt_matrix.assign(data2);
        Matrix j = new Matrix(data2);

        SingularValueDecomposition svd_jama = j.svd();

        DenseDoubleSingularValueDecomposition svd_pColt = new DenseDoubleSingularValueDecomposition(pColt_matrix, true, true);
        System.out.println("U:");
        System.out.println("pColt:");
        System.out.println(svd_pColt.getU());
        printJamaMatrix(svd_jama.getU());
        System.out.println("S:");
        System.out.println("pColt:");
        System.out.println(svd_pColt.getS());
        printJamaMatrix(svd_jama.getS());
        System.out.println("V:");
        System.out.println("pColt:");
        System.out.println(svd_pColt.getV());
        printJamaMatrix(svd_jama.getV());

    }

    public static void printJamaMatrix(Matrix inp){
        System.out.println("Jama: ");
        System.out.println(String.valueOf(inp.getRowDimension())+" X "+String.valueOf(inp.getColumnDimension()));
        DecimalFormat twoDForm = new DecimalFormat("#.####");
        StringBuffer sb = new StringBuffer();
        for (int r = 0; r < inp.getRowDimension(); r++) {
            for (int c = 0; c < inp.getColumnDimension(); c++)
                sb.append(Double.valueOf(twoDForm.format(inp.get(r, c)))).append("\t");
            sb.append("\n");
        }
        System.out.println(sb.toString());      
    }   
}

Python：

>>> import numpy
>>> numpy_matrix = numpy.array([[ 2.0, 0.0, 8.0, 6.0, 0.0], 
                [1.0, 6.0, 0.0, 1.0, 7.0], 
                [5.0, 0.0, 7.0, 4.0, 0.0], 
                [7.0, 0.0, 8.0, 5.0, 0.0], 
                [0.0, 10.0, 0.0, 0.0, 7.0]])
>>> u,s,v = numpy.linalg.svd(numpy_matrix, full_matrices=True)

代码有问题吗？

Answer 1

没错：s.v.d. U和V列的符号更改不是唯一的（即如果更改U的第i列和V的第i列的符号，您仍然有一个有效的svd：A = U * S * V ^ T）。 svd的不同实现会产生稍微不同的结果：要检查正确的svd，你必须计算norm（A-U * S * V ^ T）/ norm（A）并验证它是一个小数字。

Answer 2

没有错。 SVD将目标矩阵的列空间和行空间解析为正交基，以便对齐这两个空间并考虑沿着特征向量的扩张。对齐角度可以是唯一的，离散的集合，或如下的连续体。

例如，给定两个角度t和p以及目标矩阵（见脚注）

A =（（1，-1），（2,2））

一般分解是

U =（（0，exp [i p]），（ - exp [i t]，0））

S = sqrt（2）（（2,0），（0,1））

V * =（1 / sqrt（2））（（exp [it]，exp [it]），（exp [ip]， - exp [ip]））

恢复目标矩阵的使用 A = U S V *

答案质量的快速测试是验证 U 和 V 中每个列向量的单位长度。

脚注：矩阵是主要的格式。也就是说，矩阵 A 中的第一行向量是（1，-1）。

最后我有足够的积分来发布图像文件。

Example showing two free parameters in an SVD

奇异值分解：与Jama，PColt和NumPy的结果不同

2 个答案: