The top left 3x3 slice contains $R_{\text{world-to-cam}}$ which is the inverted $R_{\text{cam-to-world}}$ matrix.
Remember that $R^T=R^{-1}$, so a valid rotation matrix is easy to invert.
-The last 3x1 column is obtained as $t^*=-R_{\text{world-to-cam}}t$, where $t$ is the camera’s position in world.
+The last 3x1 column is obtained as $t^*=-R_{\text{world-to-cam}}t$, where $t$ is the camera’s position in world coordinates.
The bottom 1x4 row contains the homogeneous appendix $(0., 0., 0., 1.)$.
Beware, too many times, I have tried to plug in the sign-swapped translation vector directly here, forgetting to premultiply $R_{\text{world-to-cam}}$.
The intrisics matrix contains the measured properties of the camera and can used to project 3D points onto the image plane.
-The projection matrix can additionally take care for far-near clipping and viewport clipping.
+
The intrinsics matrix contains the measured properties of the camera and can used to project 3D points onto the image plane.
+The projection matrix can additionally take care of far-near clipping and viewport clipping.
Thus, while the intrinsics matrix contains only information relating to the camera properties, the projection matrix also contains information about the rendering chosen arbitrarily.
In this post, we take particular interest in the pose matrix and the extrinsics matrix, which can obtained as follows:
It always important to think about which “cam transform convention” the framework one is working with follows.
-Sometimes, the camera position and rotation is indicated via a pose matrix, sometimes as an extrinsics matrix.
+It is always important to think about which “cam transform convention” the framework one is working with follows.
+Sometimes, the camera position and rotation are indicated via a pose matrix, at other times they are represented as an extrinsics matrix.
To summarize, one good approach is to first obtain the pose matrix (cam-to-world matrix) and then the extrinsics matrix (world-to-camera matrix), as follows:
-- $R$ is an orthormal matrix, i.e.,
+
- $R$ is an orthonormal matrix, i.e.,
-- $R$ is orthogonal (i.e., the matrix’ inverse is equal to its tranpose $R^{-1}=R^T$), and
+- $R$ is orthogonal (i.e., the matrix inverse is equal to its transpose $R^{-1}=R^T$), and
- each row vector of $R$ has length 1,
@@ -625,17 +624,17 @@ Definition
# properties of a rotation
assert np.all(np.linalg.det(R) == 1)
assert np.all(R.T == np.linalg.inv(R))
-Further properties
-Rotation matrices can be, but are not necessarily symmetric, i.e., $R = R^{T}$ does generally not hold.
+Non-Symmetry
+Rotation matrices can be but are not necessarily symmetric, i.e., $R = R^{T}$ does generally not hold.
Interesting implication of orthonormality
-Orthonormality is equivalent to the property that the row vectors form an orthonomal basis.
-This is equivalent to the property that the columns vector form an orthonormal basis (https://en.wikipedia.org/wiki/Orthogonal_matrix#Matrix_properties).
+
Orthonormality is equivalent to the property that the row vectors form an orthonormal basis.
+This is equivalent to the property that the columns vector form an orthonormal basis.
As an interesting side effect of the orthonormality property, the rotation matrix contains redundancy as we only need two base vectors, i.e. two rows, i.e., two columns, to compute the third vector as the cross product of the two given vectors.
Interesting implication of $det(R) = +1$
-An orthonormal matrix could have determinant -1 or +1 (https://en.wikipedia.org/wiki/Orthogonal_matrix), be a rotation matrix is orthonormal matrix with $det(R)=+1$ (https://en.wikipedia.org/wiki/Orthogonal_group, https://en.wikipedia.org/wiki/3D_rotation_group#Orthogonal_and_rotation_matrices).
+An orthonormal matrix could have determinant -1 or +1, but [a rotation matrix is an orthonormal matrix with $det(R)=+1$](https://en.wikipedia.org/wiki/Orthogonal_group, https://en.wikipedia.org/wiki/3D_rotation_group#Orthogonal_and_rotation_matrices).
An axis sign flip cannot be represented by a proper rotation matrix.
Flipping a sign in a rotation matrix with 3 positive or negative ones will flip the determinant.
-So, to convert from a right-hand coordinate system to a left-hand coordinate system, we have to put first align two the three axes, and then flip the remaining one.
+So, to convert from a right-hand coordinate system to a left-hand coordinate system, we have to first align two axes and then flip the remaining one.
Properties of translations
### Translation ###
t = np.array([1, 2, 3])
@@ -664,8 +663,8 @@ # this is not a way of obtaining the pose matrix
assert not np.all(R @ T == M)
$R \times T = (T^{-1} \times R^{-1})^{-1}$
-Too many times, I doubted myself because I was violating the $TRS\times p$ rule: first scale, then rotate, finally translate.
-However, as described above when computing the extrinsics matrix from the pose matrix in alterantive, we are in need to first translate and only then rotate some of the matrices we are looking at.
+Too many times, I doubted myself because I was violating the $TRS\times p$ rule: first scale, then rotate, and finally translate.
+However, as described above when computing the extrinsics matrix from the pose matrix as an alternative, we are in need to first translate and only then rotate some of the matrices we are looking at.
This follows from the following rule:
Given two invertible matrices, the inverse of their product is the product of their inverses in reverse order (https://en.wikipedia.org/wiki/Invertible_matrix#Other_properties):
$
@@ -708,20 +707,20 @@
Step 1: Visualizing world and
Are we looking at a global or local coordinate axis?
For example, the camera coordinate system’s y-axis might drop to the world coordinate system’s x-axis during a rotation or convention transform, so be sure to be consistent about this.
-
A change in convention can be representation by a 3x3 or 4x4 convention transform.
+
A change in convention can be represented by a 3x3 or 4x4 convention transform.
If the handedness does not change between conventions, the convention transform is a proper rotation matrix.
If handedness changes, the convention transforms determinant becomes negative, making it what is sometimes called an improper rotation matrix.
However, this does not change the process at all.
Instead, simply need to ask ourselves how to define the convention transform.
Imagine you get coordinates in NED (x forward, y right, z down) and want to map them to Unity (x right, y up, z forward).
-My recipe is as follow:
+My recipe is as follows:
-- Draw the camera perspectives on paper with the source coordinate convention (here: NED) on the left and target (here: Unity) on the right.
-- Iterate over the the left (i.e., source) axes and ask yourself: Which target unit do I get for my source unit. That’s basically what conversion is, right? For each axes, note the answer in a colum, filling up from left to right.
-- Once done, you have 3 colums, making up an orthonormal matrix with determinant 1 (i.e., a rotation matrix), or an orthonormal matrix with determinant -1 (because handedness flipped).
+- Draw the camera perspectives on paper with the source coordinate convention (here: NED) on the left and the target (here: Unity) on the right.
+- Iterate over the left (i.e., source) axes and ask yourself: Which target unit do I get for my source unit? That’s basically what conversion is, right? For each axis, note the answer in a column, filling up from left to right.
+- Once done, you have 3 columns, making up an orthonormal matrix with determinant 1 (i.e., a rotation matrix), or an orthonormal matrix with determinant -1 (because handedness flipped).
-Step 3: Point conversion
-In order to convert incoming points from the source coordinate frame, pre-multiply the convention transform matrix to your incoming source points.
+Step 3: Point conversions
+To convert incoming points from the source coordinate frame, pre-multiply the convention transform matrix to your incoming source points.
Step 4: Rotation conversion
Converting rotations is a bit more intricate again.
The process depends on the rotation representation used by the source system:
@@ -729,15 +728,15 @@
Step 4: Rotation conversion
Converting incoming rotation matrices
Imagine, we obtain tracking data from ARKit (x right, y up, z backward) and want to visualize it in a 3D rendering engine that uses an NED convention (x forward, y right, z down).
I chose this conversion example, because all axes are different, making it easier to spot sign errors.
-The example is visualized end-to-end in the next figure and verbalized afterwards in 3 steps.
+The example is visualized end-to-end in the next figure and verbalized afterward in 3 steps.
Consider the initial state where the phone is aligned with the ARKit world coordinate system.
-Imagine a physically mounted plastic frustrum extending forward as well as two physically mounted coordinate frames, one in the ARKit convention and one in the NED convention, all glued to the phone.
+Imagine a physically mounted plastic frustum extending forward as well as two physically mounted coordinate frames, one in the ARKit convention and one in the NED convention, all glued to the phone.
In this initial state in ARKit, the phone's rotation matrix as tracked by ARKit is equal to the identity.
-Considering the ARKit coordinate frame, the tip of the z-axis lies at $(0,0,1)_\text{ARKit}$ in ARKit convention, facing backward and the jabbing the user into the eye.
-Considering the NED coordinate frame, the tip of the x-axis extend forward $(1,0,0)_\text{ARKit}$.
+Considering the ARKit coordinate frame, the tip of the z-axis lies at $(0,0,1)_\text{ARKit}$ in ARKit convention, facing backward and jabbing the user into the eye.
+Considering the NED coordinate frame, the tip of the x-axis extends forward $(1,0,0)_\text{ARKit}$.
@@ -784,9 +783,9 @@ Remember, in a world-aligned initial pose, the ARKit iPhone neutrally rests in landscape mode, screen facing the user, selfie camera on the left side of phone, and USB-C/Lightning port to the right.
-Rotating the phone 90 degrees counter-clockwise, so that the phone is in upside-down portrait mode afterwards, is a rotation around the z-axis in positive direction.
-Purely talking ARKit, the rotation from state 0 to state 1 is described by rotation matrix:
+Remember, in a world-aligned initial pose, the ARKit iPhone neutrally rests in landscape mode, the screen facing the user, the selfie camera on the left side of the phone, and the USB-C/Lightning port to the right.
+Rotating the phone 90 degrees counter-clockwise, so that the phone is in upside-down portrait mode afterward, is a rotation around the z-axis in positive direction.
+Purely talking ARKit, the rotation from state 0 to state 1 is described by the following rotation matrix:
$$
@@ -799,8 +798,8 @@
Sub
$$
-
Considering the NED frame, rotating the phone 90 degrees counter-clockwise, is a rotation around the x-axis by $-90$ degrees or $+270$ degrees.
-Purely talking NED, the rotation from state 0 to state 1 is described by rotation matrix:
+Considering the NED frame, rotating the phone 90 degrees counter-clockwise is a rotation around the x-axis by $-90$ degrees or $+270$ degrees.
+Purely talking NED, the rotation from state 0 to state 1 is described by the rotation matrix:
$$
@@ -814,8 +813,8 @@
Sub
However, this last rotation $R_{\text{NED}_0\text{-to-NED}_1}$, we do not have.
-Instead, all we have in our ARKit-in-a-NED-3D-visualization-system is the hard-coded convention transform, and incoming ARKit rotation.
-The question becomes: How do we get convert the ARKit rotation matrix into a NED rotation matrix?
+Instead, all we have in our system is the hard-coded convention transform and incoming ARKit rotations.
+The question becomes: How do we convert the ARKit rotation matrix into a NED rotation matrix?
Sub-Step C: Computing $R_{\text{NED}_0\text{-to-NED}_1}$
We can’t directly apply the incoming rotation matrix to the NED point, and we also cannot just pre-multiply the convention transform as we can with points.
Instead, the idea is to chain rotations as follows, converting the neutral coordinate coordinate frame to the source target frame first, applying the known rotation in the source system, and transforming back to the target system:
@@ -885,12 +884,12 @@ Sub-Step C: Computing $R_
assert np.all(R_ned_0_to_1 == R_arkit_to_ned @ R_arkit_0_to_1 @ R_arkit_to_ned.T)
So, in summary, to convert an incoming rotation matrix, we need to pre-multiply with the convention transform, and post-multiply with the inverted convention transform.
Converting incoming Euler angles
-Euler angles, and their special cases of Tait-Byran angles are Davenport angles, are an annoyance in conversion due to two reasons:
+Euler angles, and their special cases of Tait-Byran angles and Davenport angles, are an annoyance in conversion due to two reasons:
First, they introduce yet another convention.
In order to interpret three given Euler angles $\alpha, \beta, \gamma$ around $x, y$ and $z$, we need to know if they have been applied intrinsically or extrinsically, and in which order.
-For example, Unity uses Euler angle convention of extrinsic zxy while DJI (and aviation quite often) uses Euler convention of intrinsic yaw-pitch-roll, i.e., intrinsic zyx.
+For example, Unity uses Euler angle convention of extrinsic zxy while DJI (and aviation quite often) uses the Euler convention of intrinsic yaw-pitch-roll, i.e., intrinsic zyx.
However, even worse than introducing the need for yet another convention, Euler angles are discontinuous, i.e., a small change such as a single degree in one axis can make all rotation angles jump abruptly.
-For DJI aircraft, the motion is physically so limited that we mostly don’t notice these discontinuities, but in camera motions more generally, Euler angles can often lead to unexpected results (https://danceswithcode.net/engineeringnotes/rotations_in_3d/rotations_in_3d_part1.html)..
+For DJI aircraft, the motion is physically so limited that we mostly don’t notice these discontinuities, but in camera motions more generally, Euler angles can often lead to unexpected results..
Therefore, whenever receiving Euler angles, be sure to convert them to quaternions as soon as possible.
To do so, one can intuitively use the axis-angle initialization for quaternions.
To find the correct rotation, first use the corresponding hand’s grip rule for the axes x, y, and z source axes, and simply look up how this rotation is called in the target system.
@@ -923,9 +922,9 @@
Converting incoming Euler angles
\text{quat}_\text{NED} = \text{quat}(\text{yaw}, 0, 1, 0) \times \text{quat}(\text{pitch}, 0, 0, -1) \times \text{quat}(\text{roll}, -1, 0, 0).
$
Conclusion
-As this post demonstrates, I have spent my fair share on transforms of all sorts and conventions.
-I come to the conclusion that thinking through what actually goes on rather than randomly swapping signs and orders has proven more sustainable to me.
-This blog post and my associated coordinate frame conversion tool (https://mkari.de/coord-converter/) hope to help doing so.
+As this post demonstrates, I have spent my fair share on transformations of all sorts and conventions.
+I come to the conclusion that thinking through what goes on rather than randomly swapping signs and orders has proven more sustainable to me.
+This blog post and my associated coordinate frame conversion tool (https://mkari.de/coord-converter/) hope to help do so.
Typeset with Markdown Math in VSCode and with KaTeX in HTML.
diff --git a/posts/index.html b/posts/index.html
index 5992dcb..8e4c978 100644
--- a/posts/index.html
+++ b/posts/index.html
@@ -82,7 +82,7 @@
- There are 48 combinatorial ways of assigning coordinate frame axes (assign right/left, up/down, and forward/backward to x, y, z, which is $6 \times 4 \times 2$), and it seems as if our disciplines give their best in trying to use all of them. Unfortunately, this means there are $48\times47=2556$ ways of converting between coordinate frames, and each of them is dangerously close to a bug. As if that were not enough, words like extrinsics or pose matrix are used in different meanings, adding to the confusion that inherently surrounds transforms and rotations.
+ There are 48 combinatorial ways of assigning coordinate frame axes (assign right/left, up/down, and forward/backward to x, y, z, which is $6 \times 4 \times 2$), and it seems as if our disciplines give their best in trying to use all of them. Unfortunately, this means there are $48\times47=2556$ ways of converting between coordinate frames, and each of them is dangerously close to a bug. As if that were not enough, words like extrinsics or pose matrix are used with different meanings, adding to the confusion that inherently surrounds transforms and rotations.
Read more
diff --git a/posts/index.xml b/posts/index.xml
index 2e3c360..add7df7 100644
--- a/posts/index.xml
+++ b/posts/index.xml
@@ -14,7 +14,7 @@
Sat, 22 Jun 2024 00:00:00 +0000
https://mohamedkari.github.io/blog.mkari.de/posts/cam-transform/
- There are 48 combinatorial ways of assigning coordinate frame axes (assign right/left, up/down, and forward/backward to x, y, z, which is $6 \times 4 \times 2$), and it seems as if our disciplines give their best in trying to use all of them. Unfortunately, this means there are $48\times47=2556$ ways of converting between coordinate frames, and each of them is dangerously close to a bug. As if that were not enough, words like extrinsics or pose matrix are used in different meanings, adding to the confusion that inherently surrounds transforms and rotations.
+ There are 48 combinatorial ways of assigning coordinate frame axes (assign right/left, up/down, and forward/backward to x, y, z, which is $6 \times 4 \times 2$), and it seems as if our disciplines give their best in trying to use all of them. Unfortunately, this means there are $48\times47=2556$ ways of converting between coordinate frames, and each of them is dangerously close to a bug. As if that were not enough, words like extrinsics or pose matrix are used with different meanings, adding to the confusion that inherently surrounds transforms and rotations.
-