TL; DR

Question

我正在尝试从2d立体图像中了解3d点重建的基础知识。到目前为止，我所了解的内容可以总结如下：

对于3d点（深度图）重建，我们需要从2个不同的视角获取2个相同对象的图像，给定此类图像对，我们还需要Camera矩阵（例如P1，P2）

我们使用SIFT或SURF等方法在两个图像中找到相应的点。
获得相应的关键点后，我们发现使用最少8个关键点（用于8点算法）找到了基本矩阵（例如K）
鉴于我们位于摄像机1，计算摄像机2的参数，使用基本矩阵返回4个可能的摄像机参数
最终，我们使用对应的点和两个相机参数使用三角剖分法进行3d点估计。

在完成理论部分之后，作为我的第一个实验，我尝试运行可用的代码here，哪个按预期工作。经过example.py代码的一些修改，我尝试在所有连续的图像对上运行此示例，并合并3-d点云以进行对象（dino）的3d重建，如下所示：

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
    # Dino
    img1 = cv2.imread('imgs/dinos/viff.003.ppm')
    img2 = cv2.imread('imgs/dinos/viff.001.ppm')
    pts1, pts2 = features.find_correspondence_points(img1, img2)
    points1 = processor.cart2hom(pts1)
    points2 = processor.cart2hom(pts2)

    fig, ax = plt.subplots(1, 2)
    ax[0].autoscale_view('tight')
    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
    ax[0].plot(points1[0], points1[1], 'r.')
    ax[1].autoscale_view('tight')
    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
    ax[1].plot(points2[0], points2[1], 'r.')
    fig.show()

    height, width, ch = img1.shape
    intrinsic = np.array([  # for dino
        [2360, 0, width / 2],
        [0, 2360, height / 2],
        [0, 0, 1]])

    return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
    print(files[item], files[(item+1)%len])
    #dino() function takes 2 images as input
    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
    #print(('Length', len(points1))
    # Calculate essential matrix with 2d points.
    # Result will be up to a scale
    # First, normalize points
    points1n = np.dot(np.linalg.inv(intrinsic), points1)
    points2n = np.dot(np.linalg.inv(intrinsic), points2)
    E = structure.compute_essential_normalized(points1n, points2n)
    print('Computed essential matrix:', (-E / E[0][1]))

    # Given we are at camera 1, calculate the parameters for camera 2
    # Using the essential matrix returns 4 possible camera paramters
    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
    P2s = structure.compute_P_from_essential(E)

    ind = -1
    for i, P2 in enumerate(P2s):
        # Find the correct camera parameters
        d1 = structure.reconstruct_one_point(
            points1n[:, 0], points2n[:, 0], P1, P2)

        # Convert P2 from camera view to world view
        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
        d2 = np.dot(P2_homogenous[:3, :4], d1)

        if d1[2] > 0 and d2[2] > 0:
            ind = i

    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

    if not points3d.size:
        points3d = tripoints3d
    else:
        points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()

但是我得到了非常意外的结果。如果以上方法正确，或者如何合并多个3d点云以构造单个3d结构，请提出建议。

Answer 1

对您来说，另一种可能的理解途径是查看来自motion或SLAM的结构的开源实现。请注意，这些系统可能会变得非常复杂。但是，OpenSfM是用Python编写的，我认为它很容易浏览和理解。我经常将其作为自己工作的参考。

仅向您提供更多入门信息（如果您选择沿这条路走）。运动结构是一种算法，用于收集2D图像并从中创建3D模型（点云），在该算法中还可以解决每个摄像机相对于该点云的位置（即，所有返回的摄像机姿势都位于世界中）框架，点云也是如此。

OpenSfM的高级步骤：

阅读图像exif以获取可以使用的任何先前信息（例如，对焦长度）
提取特征点（例如SIFT）
匹配特征点
将这些特征点匹配项转换为轨迹（例如，如果您在图像1,2和3中看到了一个特征点，则可以将其连接到曲目而不是match（1,2），match（2,3）等...）
增量重建（请注意，这也是一种全局方法）。此过程将使用轨道来逐步添加图像进行重建，对新点进行三角测量并完善使用称为“束调整”的过程来确定姿势/点的位置。

希望这会有所帮助。

Answer 2

总体思路如下。

在代码的每次迭代中，您都将计算右摄像机相对于左摄像机的相对姿势。然后，对2D点进行三角剖分，并将得到的3D点串联成一个大数组。但是级联点不在同一坐标系中。

您需要做的是累积估计的相对姿势，以保持绝对姿势估计。然后，您可以像以前一样对2D点进行三角剖分，但是在连接结果点之前，需要将它们映射到第一台摄像机的坐标系。

这是操作方法。

首先，在循环之前，初始化累积矩阵absolute_P1：

points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

for item in range(len-1):
    # ...

然后，在特征三角剖分之后，将3D点映射到第一个摄像机的坐标系并更新累积的姿势：

# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

if not points3d.size:
    points3d = abs_tripoints3d
else:
    points3d = np.concatenate((points3d, abs_tripoints3d), 1)

# ...

Answer 3

TL; DR

仅将所有两个图像重建组合在一起，可能无法获得所需的完整3D重建。我试图以许多不同的方式来做到这一点，但没有一个起作用。基本上，故障似乎都归结为2图像姿态估计算法中的噪声，这经常会产生不合理的结果。通过简单地组合所有两个图像姿态来跟踪绝对姿态的任何尝试都会在整个重建过程中传播噪声。

OP正在使用的repo中的代码基于教科书Multiple View Geometry in Computer Vision。第19章引用了paper that discusses a successful 3D reconstruction of the dinosaur sequence，并且他们的方法更为复杂。除了进行两次图像重建外，他们还使用了3次图像重建，并且（最重要的是）最后进行了拟合步骤，以确保没有任何假的结果破坏重建。

代码

...进行中

从2d图像进行3d点重建

3 个答案:

TL; DR

代码