Skip to content
Simon Fuhrmann edited this page May 29, 2013 · 14 revisions

Wiki HomeMath Cookbook

Camera Conventions

The MVE camera conventions use common textbook notation, e.g. from the book "Multiple View Geometry in Computer Vision" from Hartley and Zisserman. The projection of a 3D point X in space on the image plane works as follows:

x = K * (R * X + t)

where K is the calibration matrix, R is the world to camera rotation matrix, and t is the camera translation vector. R and t are referred to as extrinsic camera parameters. The calibration matrix is assembled from quantities referred to as intrinsic camera parameters.

Extrinsic Parameters

The extrinsic parameters transform 3D points X in world coordinates into 3D points in camera coordinates X' = R * X + t. This transformation can also be applied using homogeneous coordinates X' = (R|t) * X where (R|t) is a 3x4 matrix. The translation vector is computed from the known camera center as t = -R * c. The camera center is computed from the known translation as c = -R-1 * t. The inverse of R can be obtained by transposing R (only if R is a proper rotation matrix). To transform a point in camera coordinates to world coordinates, the inverse world-to-camera, or camera-to-world, transformation is applied: X = R-1 * X' - R-1 * t.

Intrinsic Parameters

The calibration matrix is composed of the focal length of the camera, the principal point of the image plane, and the pixel aspect ratio. The focal length is normalized in the following way: Suppose the longer side of the image plane has length 1 in 3D space. Then the normalized focal length is the orthogonal distance from the camera center to the image plane. For example, the normalized focal length of a 35mm lens projecting on a 35mm sensor is 1.

TODO: Insert illustration how the normalized focal length is defined.

For a 3D point X' in camera coordinates, the projection on the image plane is computed as x = p(K * X') where p(x') is a function that performs the central projection, i.e. divides by the third coordinate in order to get a point on the image plane at distance 1 from the camera center.

TODO: Insert illustration how 3D points are projected on the image plane.

The calibration matrix K can directly be defined such that image coordinates are obtained. This is done by scaling the focal length with the largest dimension, i.e. with max(width, height), and setting the principal point to width / 2 and height / 2 respectively. This yields continuous coordinates on the image plane between (0,0) and (width, height). The center of pixel (0,0) is at (0.5, 0.5), i.e. the obtained coordinates on the image plane need to be subtracted by (0.5, 0.5) to obtain pixel coordinates.