diff --git a/imgs/irs-cam-pose-check.png b/imgs/irs-cam-pose-check.png new file mode 100644 index 0000000..1317137 Binary files /dev/null and b/imgs/irs-cam-pose-check.png differ diff --git a/irs_dataset_pose.md b/irs_dataset_pose.md new file mode 100644 index 0000000..ddb6b01 --- /dev/null +++ b/irs_dataset_pose.md @@ -0,0 +1,154 @@ +# How to Calculate Camera-to-world Pose in the OpenCV-Style Coordinate System for IRS Dataset + +## 0. Code + +Please check the code [prepare_irs_dataset_pose.py](./prepare_irs_dataset_pose.py) on how to generate the opencv-style camera-to-world poses from the `UE_Trace.txt` files. + +## 1. Raw Camera Pose in IRS Dataset + +The raw camera poses in IRS dataset are generated in the Unreal Engine (UN), and saved in the "*/UE_Trace.txt" files. + +- The `UE_trace.txt` is a text file containing the translation and orientation of the camera in a fixed coordinate frame (i.e., UE coordinate here). + +- Each line in the text file contains a single pose defined in the UE coordinate system (See below). + +- The number of lines/poses is the same as the number of image frames in the current folder. + +- The first 7 numbers of each line are '**tx ty tz qx qy qz qw**', where + + - **tx ty tz** give the camera-to-world translation (in centimeters) in UE coordinate system. + - **qx qy qz qw** give a camera-to-world orientation in the form of a unit quaternion. + + +- For example, this file `*/IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt` gives + +```plain +562.509460 554.905151 53.445610 0.004622 0.004660 -0.704158 0.710013 0.000000 0.000000 0.000000 +562.510925 554.748474 65.385399 0.004622 0.004660 -0.704158 0.710013 0.025151 -2.628278 199.982956 +562.512146 554.608765 76.015526 0.004622 0.004660 -0.704158 0.710013 0.021971 -2.629248 199.983032 +... +... +... +``` + +where, you can find + +```python +tx, ty,tz = 562.509460, 554.905151, 53.445610 +tx /= 100.0 # centimeters to meters +ty /= 100.0 +tz /= 100.0 +qz, qy, qz, qw = 0.004622, 0.004660, -0.704158, 0.710013 +# now you can convert a unit quaternion to a rotation matrix and so on ... +``` + +- Please check the code for more details. + +```python +import numpy as np +# Load the pose file: +pose_src_file = 'IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt' +pose_quats = np.loadtxt(pose_src_file, comments='#', + usecols = (0,1,2,3,4,5,6) # read first 7 elements; + ).astype(np.float32) +``` + +## 2. UE and OpenCV-Style Coordinates + +### 2.1 Unreal Engine Coordinate System + +- The Unreal Engine (UE) system uses the Cartesian coordinates (x Forward, y Right, z Up) to represent a position relative to a local origin. + +- It is a left-hand coordinate system. + +```plain + + +z (Up) | + | / +x (Forward) + | / + | / + | / + (Origin O) |/_ _ _ _ _ _ _ _ +y (to right, East) + + UE Coordinate, Left-hand Coordinate System, + assuming your eye is behind the y-O-z plane and seeing +x forward. +``` + +### 2.2 OpenCV Coordinate System + +- OpenCV coordinate system uses the Cartesian coordinates as the x-axis pointing to the right, the y-axis downward, and the z-axis forward. + +```plain + / +z (to Forward) + / + / + (Origin O) /_ _ _ _ _ _ _ +x (to Right) + | + | + | + | +y (Down) + + OpenCV Coordinate, Right-hand Coordinate System, + assuming your eye is behind the x-O-y plane and seeing +z forward. +``` + +### 2.3 Why We Need OpenCV-style Camera Pose + +It is because we use the following pipeline to connect RGB, camera, and world: + +RGB image $(x,y)$ with $x$ pointing to the right, $y$ down, and image `origin` in the `left-top corner` +---> camera intrinsic K and inverse invK ---> camera points $P^{c}$ = $(X^{c}, Y^{c},Z^{c})$ +---> camera extrinsic E and inverse invE ---> world points $P^{w}$ = $(X^{w}, Y^{w},Z^{w})$. + + +### 2.4 Notation + +Assume we have the following coordinate systems: + +- `wue`: the world coordinate in UE (x Forward, y Right, z Up) format; +- `cned`: the camera coordinate in UE (x Forward, y Right, z Up) format; +- `w`: the world coordinate in OpenCV style (x Right, y Down, z Forward); +- `c`: the camera coordinate in OpenCV style (x Right, y Down, z Forward); + + +### 2.5. How to get the transformation matrix from UE to OpenCV Style + +- The matrix is defined as $T^{w}_{wue}$ to map the points $P^{wue}$ to the points $P^{w}$, i.e., $P^{w}$ = $T^{w}_{wue}$ * $P^{wue}$ + +- The matrix is `also` defined as $T^{c}_{cue}$ to map the points $P^{cuw}$ to the points $P^{c}$, i.e., $P^{c}$ = $T^{c}_{cue}$ * $P^{cue}$ + +- To find $T^{w}_{wue}$ is to project (or to calculate the `dot-product` between) each axis (as a unit vector) of $x^{wue}$, $y^{wue}$, $z^{wue}$, into the axis $x^w$, $y^w$, $z^w$. +- *You can check the details in Chapter 2.2 of the book John J. Craig, Introduction to Robotics: Mechanics and Control, Third Edition (2005).* + +- Following the coordinates drawn above, we can get this matrix as: + +```python + T = np.array([ + [0,1,0,0], + [0,0,-1,0], + [1,0,0,0], + [0,0,0,1]], dtype=np.float32) +``` + +- And we have $T^{w}_{wue}$ = $T^{c}_{ue}$ = $T$. + +## 3. How to map the camera-to-world pose in UE to OpenCV-Style + +- OpenCV-style camera-to-world pose: + - We want to find the cam-to-world pose $T^{w}_{c}$, which do the mapping $P^w = T^{w}_{c} * P^{c}$. + - note: `$T^{w}_{c}$` etc are in LaTex style if not shown correctly. + + +- Apply the chain rule, we have: + +$T^{w}_{c}$ = $T^{w}_{wue}$ * $T^{wue}_{cue}$ * $T^{cue}_{c}$ = $T$ * `camera-to-world-pose-UE` * inv(T) + +where, the `camera-to-wolrd pose in UE` can be loaded from the `UE_trace.txt` beforementioned. + +## 4. Verify the Cameara Pose You Just Got + +- The generated camera poses can be verified by depth warping among multi-view images. See an example from `OfficeMedley3/l_1.png` and `OfficeMedley3/l_3.png`. + +![camera poses verified](./imgs/irs-cam-pose-check.png?raw=true "Camera pose verified by multi-view image warping") + +You can find the pixel highlighted by a red circle is visually correctly warped into another view highlighted by a green circle. diff --git a/prepare_irs_dataset_pose.py b/prepare_irs_dataset_pose.py new file mode 100644 index 0000000..54d9592 --- /dev/null +++ b/prepare_irs_dataset_pose.py @@ -0,0 +1,108 @@ +import os +import numpy as np +import sys +from glob import glob + +#import tqdm +from os import path as osp +from scipy.spatial.transform import Rotation + +scenes = [ + 'Office', + 'Home', + 'Restaurant', + 'Store', + ] + +""" +# To get a 4x4 transformation matrix from +# a translation vector (tx,ty,tz) and +# a unit quaternion (qx qy qz qw). +""" +def pos_quat2SE_matrix(quat_data # [tx ty tz qx qy qz qw], tx,ty,tz in meter; + ): + SO = Rotation.from_quat(quat_data[3:7]).as_matrix() + SE = np.eye(4) + SE[0:3,0:3] = SO + SE[0:3,3] = quat_data[0:3] + return SE + +# Unreal Engine coordinates to OpenCV-style coordinates; +def ue2cam(quat_data): + ''' + # wue: world coordinate in Unreal Engine (x Forward, y Right, z Up) format; + # cue: camera coordinate in Unreal Engine (x Forward, y Right, z Up) format; + # w: world coordinate in OpenCV style (x Right, y Down, z Forward); + # c: camera coordinate in OpenCV style (x Right, y Down, z Forward); + # To find T_wue_2_w is to project each axis of x^wue, y^wue, z^wue, + # into axis x^w, y^w, z^w, + # i.e., P^w = T_{wue}^{w} * P^{wue} + ''' + + # To find $T^{w}_{wue}$ is to project (or to calculate the `dot-product` between) + # each axis (as a unit vector) of $x^{wue}$, $y^{wue}$, $z^{wue}$, + # into the axis $x^w$, $y^w$, $z^w$. + # > see: You can check the details in Chapter 2.2 of the book John J. Craig, + # Introduction to Robotics: Mechanics and Control, Third Edition (2005). + T = np.array([ + [0,1,0,0], + [0,0,-1,0], + [1,0,0,0], + [0,0,0,1]], dtype=np.float32) + T_wue_2_w = T + # Similarly, we can find the transformation from cue to c; + T_cue_2_c = T + T_c_2_cnet = np.linalg.inv(T_cue_2_c) + T_cue_2_wue = pos_quat2SE_matrix(quat_data) + #NOTE: We want to find the pose between c and w in OpenCV style coordinates; + # That is to say to find the cam-to-world pose T^{w}_{c}, + # which maps P^w = T^{w}_{c} * P^{c}; + # Using the chain-rule: + # T^{w}_{c} = T^{w}_{wue} * T^{wue}_{cue} * T^{cue}_{c} + T_cam_2_world = np.matmul(np.matmul(T_wue_2_w, T_cue_2_wue), T_c_2_cnet) + return T_cam_2_world + +if __name__ == '__main__': + + data_root = "./data/IRS" + for seq in scenes: + scan_paths = sorted( + # one example: */IRS/Restaurant/DinerEnvironment_Dark/l_1.png + glob(osp.join(data_root, seq, f"*/")) + ) + for scan in scan_paths: + print ("scan = ", scan) + + # e.g., scan = */IRS/Restaurant/DinerEnvironment_Dark/ + # to get "DinerEnvironment_Dark"; + if scan.endswith("/"): + cur_P0X = scan[:-1].split("/")[-1] + else: + cur_P0X = scan.split("/")[-1] + + print ("cur_folder = ", cur_P0X) + + # e.g., = */IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt + pose_src_file = osp.join(data_root, f'Auxiliary/CameraPos/{seq}/{cur_P0X}/UE_Trace.txt') + if os.path.exists(pose_src_file): + dst_pose_dir = osp.join(data_root, seq, cur_P0X, f"pose_me_left") + #os.system(f"rm -rf {dst_pose_dir}") + os.makedirs(dst_pose_dir, exist_ok=True) + pose_quats = np.loadtxt(pose_src_file, comments='#', + usecols = (0,1,2,3,4,5,6) # read first 7 elements; + ).astype(np.float32) + #print ("??? pose_quats ", pose_quats.shape) + img_paths = glob(osp.join(scan, 'l_*.png')) + assert len(img_paths) == pose_quats.shape[0], f"Requires #image {len(img_paths)} == #pose {pose_quats.shape[0]}" + print (f"read from {pose_src_file}, and save to {dst_pose_dir}") + for i in range(pose_quats.shape[0]): + # i+1: image name starting from 1, 2, 3, ...; + pose_txtfile = osp.join(dst_pose_dir, f"{i+1:06d}_left.txt") + #if not os.path.exists(pose_txtfile): + quat = pose_quats[i,:7] # [tx ty tz qx qy qz qw] + # change tx, ty, tz from cm to meters + quat[:3] = quat[:3] / 100.0 # cm to meters; + T_cam2world_invE = ue2cam(quat) + np.savetxt(pose_txtfile, T_cam2world_invE) + #if i > 5: + # sys.exit() \ No newline at end of file