UMT-Computer-Vision-Spring-2019 · nthiros · Apr 3, 2019 · Apr 4, 2019 · Apr 4, 2019 · Apr 4, 2019
diff --git a/.ipynb_checkpoints/multiview_structure_from_motion-checkpoint.ipynb b/.ipynb_checkpoints/multiview_structure_from_motion-checkpoint.ipynb
@@ -0,0 +1,188 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Project 4: Sequential Structure from Motion\n",
+    "\n",
+    "### Due 4/3/2019\n",
+    "\n",
+    "### Graduate Students: Our next reading is [Snavely, 2006](http://195.130.87.21:8080/dspace/bitstream/123456789/636/1/Photo%20tourism%20exploring%20photo%20collections%20in%203D.pdf).  We'll have a written report on this one: these methods papers aren't as good for discussions as I'd hoped.\n",
+    "\n",
+    "## Problem Statement\n",
+    "\n",
+    "You have now developed code that takes two photographs, finds matching locations in both, determines the relative motion between the cameras that took both photos, and solves for the 3D position of those points using epipolar geometry.  **The next (and final for our purposes) stage is to extend this analysis to more than two images**, such that we can build 3D models of objects on the ground with just about as much detail as we'd like.\n",
+    "\n",
+    "## Adding the third photo\n",
+    "How do we add these additional photos?  To be perfectly honest, at this point it's mostly an exercise in match housekeeping: we've already developed most of the code that we need.  First, let's consider what we've got after matching our first two images, $I_1$ and $I_2$.  First, we have a set of keypoints in each image, associated with a set of matches.  These matches have been quality controlled twice:  first by the ratio test, then by RANSAC in conjunction with the recovery of the essential matrix.  Assuming that we've used our known camera matrix to convert our pixel-wise coordinates to generalized coordinates, let's call these keypoints $\\mathbf{x}_1$ and $\\mathbf{x}_2$.  In practice, we can drop all of those keypoints for which there is not an associated accepted match.  Then, for each of our kept matches, we have the essential matrix $E_{12}$, from which we can extract a pair of projection matrices $\\mathbf{P}_1 = [\\mathbf{I}|\\mathbf{0}]$ and $\\mathbf{P}_2 = [\\mathbf{R}_2|\\mathbf{t}_2]$.  Using these projection matrices, we generated 3D, world coordinate location of the corresponding features that showed up robustly in both images.  We'll call these coordinates $\\mathbf{X}_{12}$.  \n",
+    "\n",
+    "To add a third image $\\mathbf{I}_3$ to the mix, consider that the situation outlined above is sort of analogous to the information that we have when we want to do pose estimation with ground control points: we have 3D world coordinates as well as the image coordinates of a set of points (a bunch of them, usually!), and we want to recover the camera pose.  The problem is that the feature generalized coordinates that we've computed are for $I_1$ and $I_2$, but not $I_3$.  Is this a big problem?  Of course not!  We can simply find $\\mathbf{x}_3$ in $I_3$ that correspond to $\\mathbf{x}_2$, the keypoints in the second image.  Then we identify these keypoints with the 3D poi nts $\\mathbf{X}_{12}$.  Thus we have image coordinates of features in the third image and the corresponding world coordinates: we can now perform pose estimation, just as we did in Project 1.  \n",
+    "\n",
+    "Of course there are a few minor caveats: first, we need to filter out spurious matches between $\\mathbf{x}_2$ and $\\mathbf{x}_3$.  To do this, we can utilize a tool that we already have: RANSAC estimation of the essential matrix.  Because $I_2$ and $I_3$ are related by epipolar geometry just as $I_1$ and $I_2$ are, we can use the same subroutine to compute the essential matrix $\\mathbf{E}_{23}$, and (critically) identify and filter outliers, i.e. we'll discard matches that don't don't correspond to the consensus view of the essential matrix.  This also leads to the next caveat, namely that we need an initial guess (call it $P_3^0$) for pose estimation to converge properly.  Where should this initial guess come from?  The $\\mathbf{E}_{23}$ provides a rotation given as if camera 2 were canonical, i.e. $\\mathbf{P_2'}=[\\mathbf{I}|\\mathbf{0}]$, $\\mathbf{P_3}'=[\\mathbf{R}_3'|\\mathbf{t}_3']$.  We'll call it $P_3'$.  We need to convert this projection matrix to a coordinate system in which $I_1$ (not $I_2$) is canonical.  Fortunately, this is easy:\n",
+    "$$\n",
+    "P_3 \\approx P_2 P_3'.\n",
+    "$$\n",
+    "This $P_3$, is a an excellent initial guess for pose estimation (in principle, it's rotation matrix should actually be correct).  Note that the translation component is only good up to a constant: however, this isn't too problematic because its direction is close to correct, and any optimization just needs to perform what amounts to a line search (a univariate optimization problem) to find the correct scaling. \n",
+    "\n",
+    "Once we have a robust estimation of the third camera's pose, we can use it do point triangulation on the correspondences between $I_2$ and $I_3$ not associated with an already-known world coordinate point, which allows us to augment our 3D model with new points.  Additionally, we can perform triangulation with *3 views*, potentially improving our accuracy.  Moreover, we can apply the process above iteratively, adding more and more images to generate a highly featured 3D model from (for example) 360 degrees worth of view angles.  \n",
+    "\n",
+    "## Application\n",
+    "**Generate code that performs the above process for a third image.  Apply it to one of the 3D image datasets that we generated in class.  Note that we will be collecting aerial imagery from drones as well.  Apply this method to a sequence of drone imagery as well.**  As a challenge, can you implement code that sequentially adds an arbitrary number of images?\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+<<<<<<< HEAD
+   "source": []
+=======
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import PIL.Image\n",
+    "import matplotlib.image as mpimg\n",
+    "import scipy.optimize as so\n",
+    "import cv2\n",
+    "import piexif\n",
+    "import sys\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "class Image(object):\n",
+    "    def __init__(self, img): \n",
+    "        self.img = plt.imread(img)\n",
+    "        # Image Description\n",
+    "        self.image = piexif.load(img)\n",
+    "        self.h = plt.imread(img).shape[0]\n",
+    "        self.w = plt.imread(img).shape[1]\n",
+    "        self.d = plt.imread(img).shape[2]\n",
+    "        self.f = self.image['Exif'][piexif.ExifIFD.FocalLengthIn35mmFilm]/36*self.w\n",
+    "        self.sift = cv2.xfeatures2d.SIFT_create()\n",
+    "        self.keypoints,self.descriptor = sift.detectAndCompute(self.img, None)\n",
+    "        self.realWorldtoCam = []\n",
+    "        \n",
+    "\n",
+    "class camClass(Image):\n",
+    "    def __init__(self): \n",
+    "        self.images = []\n",
+    "        self.pointCloud = []\n",
+    "    \n",
+    "    def add_images(self,image):\n",
+    "        #image.pose = self.pose_guess  #initialize image with guess\n",
+    "        self.images.append(image)\n",
+    "\n",
+    "\n",
+    "    def genPointCloud(self):\n",
+    "        def triangulate(P0,P1,x1,x2):\n",
+    "            '''X,Y,Z,W of the key points that are found in 2 images\n",
+    "               P0 and P1 are poses, x1 and x2 are SIFT key-points'''\n",
+    "            A = np.array([[P0[2,0]*x1[0] - P0[0,0], P0[2,1]*x1[0] - P0[0,1], P0[2,2]*x1[0] - P0[0,2], P0[2,3]*x1[0] - P0[0,3]],\n",
+    "                         [P0[2,0]*x1[1] - P0[1,0], P0[2,1]*x1[1] - P0[1,1], P0[2,2]*x1[1] - P0[1,2], P0[2,3]*x1[1] - P0[1,3]],\n",
+    "                         [P1[2,0]*x2[0] - P1[0,0], P1[2,1]*x2[0] - P1[0,1], P1[2,2]*x2[0] - P1[0,2], P1[2,3]*x2[0] - P1[0,3]],\n",
+    "                         [P1[2,0]*x2[1] - P1[1,0], P1[2,1]*x2[1] - P1[1,1], P1[2,2]*x2[1] - P1[1,2], P1[2,3]*x2[1] - P1[1,3]]])\n",
+    "            u,s,vt = np.linalg.svd(A)\n",
+    "            return vt[-1]\n",
+    "            \n",
+    "        for i in range(len(self.images)-1):\n",
+    "            I1 = self.images[i]\n",
+    "            I2 = self.images[i+1]\n",
+    "            bf = cv2.BFMatcher()\n",
+    "            matches = bf.knnMatch(I1.descriptor,I2.descriptor,k=2)\n",
+    "\n",
+    "        # Apply ratio test\n",
+    "            good = []\n",
+    "            for i,(m,n) in enumerate(matches):\n",
+    "                if m.distance < 0.7*n.distance:\n",
+    "                    good.append(m)\n",
+    "            u1 = []\n",
+    "            u2 = []\n",
+    "            for m in good:\n",
+    "                u1.append(I1.keypoints[m.queryIdx].pt)\n",
+    "                u2.append(I2.keypoints[m.trainIdx].pt)\n",
+    "\n",
+    "        # General Coordinates\n",
+    "            u1g = np.array(u1)\n",
+    "            u2g = np.array(u2)\n",
+    "\n",
+    "        # Make Homogeneous\n",
+    "            u1h = np.c_[u1g,np.ones(u1g.shape[0])] #image coords of keypoints\n",
+    "            u2h = np.c_[u2g,np.ones(u2g.shape[0])]\n",
+    "\n",
+    "            # Image Center\n",
+    "            cv = I1.h/2\n",
+    "            cu = I1.w/2\n",
+    "\n",
+    "            # Get Camera Coordinates\n",
+    "            K_cam = np.array([[I1.f,0,I1.cu],[0,I1.f,I1.cv],[0,0,1]])\n",
+    "            K_inv = np.linalg.inv(K_cam)\n",
+    "            x1 = u1h @ K_inv.T \n",
+    "            x2 = u2h @ K_inv.T \n",
+    "\n",
+    "            # Generate Essential Matrix\n",
+    "            E, inliers = cv2.findEssentialMat(x1[:,:2],x2[:,:2],np.eye(3),method=cv2.RANSAC,threshold=1e-3)\n",
+    "            inliers = inliers.ravel().astype(bool) \n",
+    "            n_in,R,t,_ = cv2.recoverPose(E,x1[inliers,:2],x2[inliers,:2])\n",
+    "\n",
+    "            x1 = x1[inliers==True]\n",
+    "            x2 = x2[inliers==True]\n",
+    "\n",
+    "            # Relative pose between two camperas\n",
+    "            if (i == 0):\n",
+    "                self.images[i].pose = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0]]) #aliased??\n",
+    "            self.images[i+1] = np.hstack((R,t))\n",
+    "            \n",
+    "            # Find X,Y,Z for all SIFT Keypoints\n",
+    "            for j in range(len(x1)):\n",
+    "                rwc = triangulate(I1.pose,I2.pose,x1[j],x2[j])\n",
+    "                if i == 0:\n",
+    "                    self.pointCloud.append(rwc)\n",
+    "                else:\n",
+    "                    if rwc in self.pointCloud: \n",
+    "                        self.pointCloud.append(rwc)   #appends to list of points in xyz coordinates\n",
+    "                        I2.realWorldtoCam[j] = x2[j]\n",
+    "                    else:\n",
+    "                        I2.realWorldtoCam[j] = None\n",
+    "\n",
+    "        self.pointCloud = np.array(self.pointCloud)\n",
+    "        self.pointCloud = self.pointCloud.T / self.pointCloud[:,3]\n",
+    "        self.pointCloud = self.pointCloud[:-1,:] \n",
+    "        \n",
+    "        \n",
+    "    def plotPointCloud(self):\n",
+    "        #%matplotlib notebook\n",
+    "        fig = plt.figure()\n",
+    "        ax = fig.add_subplot(111,projection='3d')\n",
+    "        ax.plot(*self.pointCloud,'k.')\n",
+    "        plt.show()\n"
+   ]
+>>>>>>> ce96574e4e79f6263f4c8f1f3621bd339ec7b105
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+<<<<<<< HEAD
+   "version": "3.6.7"
+=======
+   "version": "3.6.3"
+>>>>>>> ce96574e4e79f6263f4c8f1f3621bd339ec7b105
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/multiview_structure_from_motion.ipynb b/multiview_structure_from_motion.ipynb
@@ -36,7 +36,124 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import PIL.Image\n",
+    "import matplotlib.image as mpimg\n",
+    "import scipy.optimize as so\n",
+    "import cv2\n",
+    "import piexif\n",
+    "import sys\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "class Image(object):\n",
+    "    def __init__(self, img): \n",
+    "        self.img = plt.imread(img)\n",
+    "        # Image Description\n",
+    "        self.image = piexif.load(img)\n",
+    "        self.h = plt.imread(img).shape[0]\n",
+    "        self.w = plt.imread(img).shape[1]\n",
+    "        self.d = plt.imread(img).shape[2]\n",
+    "        self.f = self.image['Exif'][piexif.ExifIFD.FocalLengthIn35mmFilm]/36*self.w\n",
+    "        self.sift = cv2.xfeatures2d.SIFT_create()\n",
+    "        self.keypoints,self.descriptor = sift.detectAndCompute(self.img, None)\n",
+    "        self.realWorldtoCam = []\n",
+    "        \n",
+    "\n",
+    "class camClass(Image):\n",
+    "    def __init__(self): \n",
+    "        self.images = []\n",
+    "        self.pointCloud = []\n",
+    "    \n",
+    "    def add_images(self,image):\n",
+    "        #image.pose = self.pose_guess  #initialize image with guess\n",
+    "        self.images.append(image)\n",
+    "\n",
+    "\n",
+    "    def genPointCloud(self):\n",
+    "        def triangulate(P0,P1,x1,x2):\n",
+    "            '''X,Y,Z,W of the key points that are found in 2 images\n",
+    "               P0 and P1 are poses, x1 and x2 are SIFT key-points'''\n",
+    "            A = np.array([[P0[2,0]*x1[0] - P0[0,0], P0[2,1]*x1[0] - P0[0,1], P0[2,2]*x1[0] - P0[0,2], P0[2,3]*x1[0] - P0[0,3]],\n",
+    "                         [P0[2,0]*x1[1] - P0[1,0], P0[2,1]*x1[1] - P0[1,1], P0[2,2]*x1[1] - P0[1,2], P0[2,3]*x1[1] - P0[1,3]],\n",
+    "                         [P1[2,0]*x2[0] - P1[0,0], P1[2,1]*x2[0] - P1[0,1], P1[2,2]*x2[0] - P1[0,2], P1[2,3]*x2[0] - P1[0,3]],\n",
+    "                         [P1[2,0]*x2[1] - P1[1,0], P1[2,1]*x2[1] - P1[1,1], P1[2,2]*x2[1] - P1[1,2], P1[2,3]*x2[1] - P1[1,3]]])\n",
+    "            u,s,vt = np.linalg.svd(A)\n",
+    "            return vt[-1]\n",
+    "            \n",
+    "        for i in range(len(self.images)-1):\n",
+    "            I1 = self.images[i]\n",
+    "            I2 = self.images[i+1]\n",
+    "            bf = cv2.BFMatcher()\n",
+    "            matches = bf.knnMatch(I1.descriptor,I2.descriptor,k=2)\n",
+    "\n",
+    "        # Apply ratio test\n",
+    "            good = []\n",
+    "            for i,(m,n) in enumerate(matches):\n",
+    "                if m.distance < 0.7*n.distance:\n",
+    "                    good.append(m)\n",
+    "            u1 = []\n",
+    "            u2 = []\n",
+    "            for m in good:\n",
+    "                u1.append(I1.keypoints[m.queryIdx].pt)\n",
+    "                u2.append(I2.keypoints[m.trainIdx].pt)\n",
+    "\n",
+    "        # General Coordinates\n",
+    "            u1g = np.array(u1)\n",
+    "            u2g = np.array(u2)\n",
+    "\n",
+    "        # Make Homogeneous\n",
+    "            u1h = np.c_[u1g,np.ones(u1g.shape[0])] #image coords of keypoints\n",
+    "            u2h = np.c_[u2g,np.ones(u2g.shape[0])]\n",
+    "\n",
+    "            # Image Center\n",
+    "            cv = I1.h/2\n",
+    "            cu = I1.w/2\n",
+    "\n",
+    "            # Get Camera Coordinates\n",
+    "            K_cam = np.array([[I1.f,0,I1.cu],[0,I1.f,I1.cv],[0,0,1]])\n",
+    "            K_inv = np.linalg.inv(K_cam)\n",
+    "            x1 = u1h @ K_inv.T \n",
+    "            x2 = u2h @ K_inv.T \n",
+    "\n",
+    "            # Generate Essential Matrix\n",
+    "            E, inliers = cv2.findEssentialMat(x1[:,:2],x2[:,:2],np.eye(3),method=cv2.RANSAC,threshold=1e-3)\n",
+    "            inliers = inliers.ravel().astype(bool) \n",
+    "            n_in,R,t,_ = cv2.recoverPose(E,x1[inliers,:2],x2[inliers,:2])\n",
+    "\n",
+    "            x1 = x1[inliers==True]\n",
+    "            x2 = x2[inliers==True]\n",
+    "\n",
+    "            # Relative pose between two camperas\n",
+    "            if (i == 0):\n",
+    "                self.images[i].pose = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0]]) #aliased??\n",
+    "            self.images[i+1] = np.hstack((R,t))\n",
+    "            \n",
+    "            # Find X,Y,Z for all SIFT Keypoints\n",
+    "            for j in range(len(x1)):\n",
+    "                rwc = triangulate(I1.pose,I2.pose,x1[j],x2[j])\n",
+    "                if i == 0:\n",
+    "                    self.pointCloud.append(rwc)\n",
+    "                else:\n",
+    "                    if rwc in self.pointCloud: \n",
+    "                        self.pointCloud.append(rwc)   #appends to list of points in xyz coordinates\n",
+    "                        I2.realWorldtoCam[j] = x2[j]\n",
+    "                    else:\n",
+    "                        I2.realWorldtoCam[j] = None\n",
+    "\n",
+    "        self.pointCloud = np.array(self.pointCloud)\n",
+    "        self.pointCloud = self.pointCloud.T / self.pointCloud[:,3]\n",
+    "        self.pointCloud = self.pointCloud[:-1,:] \n",
+    "        \n",
+    "        \n",
+    "    def plotPointCloud(self):\n",
+    "        #%matplotlib notebook\n",
+    "        fig = plt.figure()\n",
+    "        ax = fig.add_subplot(111,projection='3d')\n",
+    "        ax.plot(*self.pointCloud,'k.')\n",
+    "        plt.show()\n"
+   ]
   }
  ],
  "metadata": {
@@ -55,7 +172,11 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+<<<<<<< HEAD
+   "version": "3.6.7"
+=======
+   "version": "3.6.3"
+>>>>>>> ce96574e4e79f6263f4c8f1f3621bd339ec7b105
   }
  },
  "nbformat": 4,