This is a simple implementation of perspective projection.
I'm using a left-handed coordinate system, defined by the right, up and forward axis, with the camera pointing towards the positive forward direction, as well as homogeneous coordinates for the calculations. Therefore the matrices are
Any kind of object has its intrinsic coordinate system (model space). This object can be placed relative to the world origin in the world space by using 3D transformations, such as translation, scaling or rotations.
Applying such transformations on the vertices (edges, faces) in the model space transforms the object into the world space.
In this step, a camera, defined also by its transformation matrix relative to the world space, should observe the object by modelling perspective projection.
However, to make thing as simple as possible in the next step, everything in the world space should be transformed, such that the camera is located at the origin, with its viewing direction aligning with the positive forward axis. Therefore, every object needs to be transformed, such that it keeps its relative position to the camera.
This is done by applying the view matrix on the vertices in the world space:
where
Lastly, it is key to understand that the cone containing all objects visible to the camera, given the field of view parameters, is a square base frustum. This shape should then be transformed into normalized device coordinates (NDC)
To do this, the projection matrix is defined as
with
This matrix essentially scales the objects relative to their distances to the camera.
After this transformation, the vectors should be divided by the last component in the homogeneous coordinates. Then, the first two dimensiones denote the 2D projection, while the third dimension contains depth information, relevant for rendering order.