blackjack2015 · ccj5351 · Mar 17, 2024 · Mar 17, 2024 · Mar 17, 2024
diff --git a/imgs/irs-cam-pose-check.png b/imgs/irs-cam-pose-check.png
diff --git a/irs_dataset_pose.md b/irs_dataset_pose.md
@@ -0,0 +1,154 @@
+# How to Calculate Camera-to-world Pose in the OpenCV-Style Coordinate System for IRS Dataset
+
+## 0. Code
+
+Please check the code [prepare_irs_dataset_pose.py](./prepare_irs_dataset_pose.py) on how to generate the opencv-style camera-to-world poses from the `UE_Trace.txt` files.
+
+## 1. Raw Camera Pose in IRS Dataset
+
+The raw camera poses in IRS dataset are generated in the Unreal Engine (UN), and saved in the "*/UE_Trace.txt" files.
+
+- The `UE_trace.txt` is a text file containing the translation and orientation of the camera in a fixed coordinate frame (i.e., UE coordinate here). 
+
+- Each line in the text file contains a single pose defined in the UE coordinate system (See below).
+
+- The number of lines/poses is the same as the number of image frames in the current folder.
+
+- The first 7 numbers of each line are '**tx ty tz qx qy qz qw**', where
+
+  - **tx ty tz** give the camera-to-world translation (in centimeters) in UE coordinate system.
+  - **qx qy qz qw** give a camera-to-world orientation in the form of a unit quaternion.
+
+
+- For example, this file `*/IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt` gives
+
+```plain
+562.509460 554.905151 53.445610 0.004622 0.004660 -0.704158 0.710013 0.000000 0.000000 0.000000
+562.510925 554.748474 65.385399 0.004622 0.004660 -0.704158 0.710013 0.025151 -2.628278 199.982956
+562.512146 554.608765 76.015526 0.004622 0.004660 -0.704158 0.710013 0.021971 -2.629248 199.983032
+...
+...
+...
+```
+
+where, you can find
+
+```python
+tx, ty,tz = 562.509460, 554.905151, 53.445610
+tx /= 100.0 # centimeters to meters
+ty /= 100.0
+tz /= 100.0
+qz, qy, qz, qw = 0.004622, 0.004660, -0.704158, 0.710013
+# now you can convert a unit quaternion to a rotation matrix and so on ...
+```
+
+- Please check the code for more details.
+
+```python
+import numpy as np
+# Load the pose file:
+pose_src_file = 'IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt'
+pose_quats = np.loadtxt(pose_src_file, comments='#', 
+                        usecols = (0,1,2,3,4,5,6) # read first 7 elements;
+                        ).astype(np.float32)
+```
+
+## 2. UE and OpenCV-Style Coordinates
+
+### 2.1 Unreal Engine Coordinate System
+
+- The Unreal Engine (UE) system uses the Cartesian coordinates (x Forward, y Right, z Up) to represent a position relative to a local origin.
+
+- It is a left-hand coordinate system.
+
+```plain
+
+    +z (Up) | 
+            |        / +x (Forward)
+            |      / 
+            |    / 
+            |  /
+ (Origin O) |/_ _ _ _ _ _ _ _ +y (to right, East)  
+
+    UE Coordinate, Left-hand Coordinate System,
+    assuming your eye is behind the y-O-z plane and seeing +x forward.
+```
+
+### 2.2 OpenCV Coordinate System
+
+- OpenCV coordinate system uses the Cartesian coordinates as the x-axis pointing to the right, the y-axis downward, and the z-axis forward.
+
+```plain
+                  / +z (to Forward)
+                /
+              /
+ (Origin O) /_ _ _ _ _ _ _   +x (to Right)
+            |
+            |
+            |
+            | +y (Down)
+
+    OpenCV Coordinate, Right-hand Coordinate System,
+    assuming your eye is behind the x-O-y plane and seeing +z forward. 
+```
+
+### 2.3 Why We Need OpenCV-style Camera Pose
+
+It is because we use the following pipeline to connect RGB, camera, and world:
+
+RGB image $(x,y)$ with $x$ pointing to the right, $y$ down, and image `origin` in the `left-top corner`
+---> camera intrinsic K and inverse invK ---> camera points $P^{c}$ = $(X^{c}, Y^{c},Z^{c})$
+---> camera extrinsic E and inverse invE ---> world points $P^{w}$ = $(X^{w}, Y^{w},Z^{w})$.
+
+
+### 2.4 Notation
+
+Assume we have the following coordinate systems:
+
+- `wue`: the world coordinate in UE (x Forward, y Right, z Up) format;
+- `cned`: the camera coordinate in UE (x Forward, y Right, z Up) format;
+- `w`: the world coordinate in OpenCV style (x Right, y Down, z Forward);
+- `c`: the camera coordinate in OpenCV style (x Right, y Down, z Forward);
+
+
+### 2.5. How to get the transformation matrix from UE to OpenCV Style
+
+- The matrix is defined as $T^{w}_{wue}$ to map the points $P^{wue}$ to the points $P^{w}$, i.e., $P^{w}$ = $T^{w}_{wue}$ * $P^{wue}$
+
+- The matrix is `also` defined as $T^{c}_{cue}$ to map the points $P^{cuw}$ to the points $P^{c}$, i.e., $P^{c}$ = $T^{c}_{cue}$ * $P^{cue}$
+
+- To find $T^{w}_{wue}$ is to project (or to calculate the `dot-product` between) each axis (as a unit vector) of $x^{wue}$, $y^{wue}$, $z^{wue}$, into the axis $x^w$, $y^w$, $z^w$. 
+- *You can check the details in Chapter 2.2 of the book John J. Craig, Introduction to Robotics: Mechanics and Control, Third Edition (2005).*
+
+- Following the coordinates drawn above, we can get this matrix as:
+
+```python
+    T = np.array([
+                  [0,1,0,0],
+                  [0,0,-1,0],
+                  [1,0,0,0],
+                  [0,0,0,1]], dtype=np.float32)
+```
+
+- And we have $T^{w}_{wue}$ = $T^{c}_{ue}$ = $T$.
+
+## 3. How to map the camera-to-world pose in UE to OpenCV-Style
+
+- OpenCV-style camera-to-world pose: 
+  - We want to find the cam-to-world pose $T^{w}_{c}$, which do the mapping $P^w = T^{w}_{c} * P^{c}$.
+  - note: `$T^{w}_{c}$` etc are in LaTex style if not shown correctly.
+
+
+- Apply the chain rule, we have:
+
+$T^{w}_{c}$ = $T^{w}_{wue}$ * $T^{wue}_{cue}$ * $T^{cue}_{c}$ = $T$ * `camera-to-world-pose-UE` * inv(T)
+
+where, the `camera-to-wolrd pose in UE` can be loaded from the `UE_trace.txt` beforementioned.
+
+## 4. Verify the Cameara Pose You Just Got
+
+- The generated camera poses can be verified by depth warping among multi-view images. See an example from `OfficeMedley3/l_1.png` and `OfficeMedley3/l_3.png`.
+
+![camera poses verified](./imgs/irs-cam-pose-check.png?raw=true "Camera pose verified by multi-view image warping")
+
+You can find the pixel highlighted by a red circle is visually correctly warped into another view highlighted by a green circle.
diff --git a/prepare_irs_dataset_pose.py b/prepare_irs_dataset_pose.py
@@ -0,0 +1,108 @@
+import os
+import numpy as np
+import sys
+from glob import glob
+
+#import tqdm
+from os import path as osp
+from scipy.spatial.transform import Rotation
+
+scenes = [
+    'Office',
+    'Home', 
+    'Restaurant', 
+    'Store',
+    ]
+
+""" 
+# To get a 4x4 transformation matrix from 
+# a translation vector (tx,ty,tz) and 
+# a unit quaternion (qx qy qz qw).
+"""
+def pos_quat2SE_matrix(quat_data # [tx ty tz qx qy qz qw], tx,ty,tz in meter;
+        ):
+    SO = Rotation.from_quat(quat_data[3:7]).as_matrix()
+    SE = np.eye(4)
+    SE[0:3,0:3] = SO
+    SE[0:3,3]   = quat_data[0:3]
+    return SE
+
+# Unreal Engine coordinates to OpenCV-style coordinates;
+def ue2cam(quat_data):
+    '''
+    # wue: world coordinate in Unreal Engine (x Forward, y Right, z Up) format;
+    # cue: camera coordinate in Unreal Engine (x Forward, y Right, z Up) format;
+    # w: world coordinate in OpenCV style (x Right, y Down, z Forward);
+    # c: camera coordinate in OpenCV style (x Right, y Down, z Forward);
+    # To find T_wue_2_w is to project each axis of x^wue, y^wue, z^wue, 
+    # into axis x^w, y^w, z^w,
+    # i.e., P^w = T_{wue}^{w} * P^{wue}
+    '''
+
+    # To find $T^{w}_{wue}$ is to project (or to calculate the `dot-product` between) 
+    # each axis (as a unit vector) of $x^{wue}$, $y^{wue}$, $z^{wue}$, 
+    # into the axis $x^w$, $y^w$, $z^w$.
+    # > see: You can check the details in Chapter 2.2 of the book John J. Craig, 
+    # Introduction to Robotics: Mechanics and Control, Third Edition (2005).
+    T = np.array([
+                  [0,1,0,0],
+                  [0,0,-1,0],
+                  [1,0,0,0],
+                  [0,0,0,1]], dtype=np.float32)
+    T_wue_2_w = T
+    # Similarly, we can find the transformation from cue to c;
+    T_cue_2_c = T
+    T_c_2_cnet = np.linalg.inv(T_cue_2_c)
+    T_cue_2_wue = pos_quat2SE_matrix(quat_data)
+    #NOTE: We want to find the pose between c and w in OpenCV style coordinates;
+    # That is to say to find the cam-to-world pose T^{w}_{c}, 
+    # which maps P^w = T^{w}_{c} * P^{c};
+    # Using the chain-rule:
+    # T^{w}_{c} = T^{w}_{wue} * T^{wue}_{cue} * T^{cue}_{c}
+    T_cam_2_world = np.matmul(np.matmul(T_wue_2_w, T_cue_2_wue), T_c_2_cnet)
+    return T_cam_2_world
+
+if __name__ == '__main__':
+
+    data_root = "./data/IRS"
+    for seq in scenes:
+        scan_paths = sorted(
+            # one example: */IRS/Restaurant/DinerEnvironment_Dark/l_1.png
+            glob(osp.join(data_root, seq, f"*/"))
+            )
+        for scan in scan_paths:
+            print ("scan = ", scan)
+
+            # e.g., scan = */IRS/Restaurant/DinerEnvironment_Dark/
+            # to get "DinerEnvironment_Dark";
+            if scan.endswith("/"):
+                cur_P0X = scan[:-1].split("/")[-1] 
+            else:
+                cur_P0X = scan.split("/")[-1] 
+
+            print ("cur_folder = ", cur_P0X)
+
+            # e.g., = */IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt
+            pose_src_file = osp.join(data_root, f'Auxiliary/CameraPos/{seq}/{cur_P0X}/UE_Trace.txt')
+            if os.path.exists(pose_src_file):
+                dst_pose_dir = osp.join(data_root, seq, cur_P0X, f"pose_me_left")
+                #os.system(f"rm -rf {dst_pose_dir}")
+                os.makedirs(dst_pose_dir, exist_ok=True)
+                pose_quats = np.loadtxt(pose_src_file, comments='#', 
+                                        usecols = (0,1,2,3,4,5,6) # read first 7 elements;
+                                        ).astype(np.float32)
+                #print ("??? pose_quats ", pose_quats.shape)
+                img_paths = glob(osp.join(scan, 'l_*.png'))
+                assert len(img_paths) == pose_quats.shape[0], f"Requires #image {len(img_paths)} == #pose {pose_quats.shape[0]}"
+                print (f"read from {pose_src_file}, and save to {dst_pose_dir}")
+                for i in range(pose_quats.shape[0]):
+                    # i+1: image name starting from 1, 2, 3, ...;
+                    pose_txtfile = osp.join(dst_pose_dir, f"{i+1:06d}_left.txt")
+                    #if not os.path.exists(pose_txtfile):
+                    quat = pose_quats[i,:7] # [tx ty tz qx qy qz qw]
+                    # change tx, ty, tz from cm to meters
+                    quat[:3] = quat[:3] / 100.0 # cm to meters;
+                    T_cam2world_invE = ue2cam(quat)
+                    np.savetxt(pose_txtfile, T_cam2world_invE)
+                    #if i > 5:
+                    #  sys.exit()