Implementation and Applications of a Camera Calibration algorithm

Final Report

Mark Wang
CS 223B
March 15, 2000

Abstract

For this project, we implemented a standard camera calibration algorithm [1] based on the work of Tsai, et. al. which takes a set of points in 2-D image (pixel) space and their corresponding points in 3-D world space, and outputs various extrinsic parameters of the transformation between the world frame and the camera frame, such as the rotation matrix R.

We test our implementation on both synthetic and acquired images from a variety of viewpoints. The initial results we obtained, while needing improvement in some areas such as robustness in the face of error and uncertainty, indicated the potential for the algorithm to form the nucleus for several applications, such as object location or tracking. Coupled with automatic methods for feature detection and point matching, we believe that this camera calibration will have many practical uses.

1. Introduction

At its most basic level, camera calibration deals with recovering certain extrinsic parameters of an image acquisition system, given image points and their corresponding world points in 3-D space, which are known to the user, as input.

2. The Basic Algorithm and System

The algorithm that we used for implementation is outlined in Section 6.2 of Trucco and Verri, and the reader is directed there for the basic mathematical details. Here, we focus on issues and results in our specific implementation of it.

Input/Output and Restrictions

As input, we read in a grayscale image, as well as a list of world points which are predetermined and reflect the world coordinates of the test object. As output, we produce the 3x3 rotation matrix R which, together with the translation vector T, form the transformation between camera and world coordinate systems. R is graphically visualized as a set of orthonormal coordinate axes in an isometric view.

In addition, we also determine alpha, the aspect ratio of pixels, and f, the focal length along the camera's optical axis.

Assumptions we make about the viewpoint -- there are restrictions we place on it -- in particular, that the directions of all three principal orthonormal axes in the image can be ascertained, and that they are reasonably distinct, which implies that all three sides of the cube should be visible. This is so that our experimental method for automated point-matching (detailed below) can work.

For the matching to work we need at least eight points in our matrix. Furthermore, the image points that we match to known world points must reside on at least two different planes in order to have a non-singular matrix.

3. Implementation Issues

Our system was built upon the freely-available Horatio[3] library, which implements many basic linear algebra and image processing operations that we use.

Specifically, to simplify the implementation, we extensively leveraged and utilized pre-existing source code freely available for edge detection and line detection and segmentation, found in Horatio, as well as its general matrix and vector arithmetic facilities, including SVD and least-squares.

3.1. Constructing the target

For our acquired images, a box "calibration" target was constructed with measured dimensions. On three sides corresponding to the three principal planes of the box, a calibration pattern of four 1-inch squares were printed and pasted on each side.

3.2. Image generation

Rendered images of the box, with equivalent dimensions were generated using 3D Studio MAX. We used the synthetic approach for the bulk of our work because it allowed us to control parameters exactly, and be immune to noise, lighting irregularities, radial distortion, and other parameters which could potentially affect our images.

3.3. Tweaking the parameters

The image processing functions that we used were edge detection, line detection, and junction finding. We had to tweak the parameters in Horatio's configuration file (test.cfg in this case) for each of these functions. The tradeoffs that had to be made are detailed below.

3.4. Image center determination

Because the algorithm assumes that the origin of the image frame is located at the center of the image, rather than at the corner as is the case by default in most graphical interfaces, we had to introduce specific offsets (denoted in Trucco and Verri by o_x and o_y) Initially, we computed the 2-D bounding box for identified junction points for each image, but later, made the assumption that the origin was fixed in the center of the image. We found that this did not visibly affect the results, and allowed us to deal with one less possible variable and thus, source of error when comparing results of multiple images side by side.

The pipeline

Original image

Threshold the image
Binary-thresholding the image using a cutoff of 100 seems to give better results for edge detection, and hence line generation and junction-finding.

Edge detection
We apply an implementation of Canny's algorithm on the thresholded image to give us an edge map.

Line detection
We use a provided orthogonal-regression line detector to turn our edge map into lines, with suitable cutoffs for minimum length to filter out must spurious candidates.

Junction finding
We find intersections of the lines using a provided junction finding algorithm. The main consideration in this stage was to balance the search window size used in finding a neighboring endpoint, given a line segment. We need to find junctions of disjoint line segments that should be together, such as two lines coming together to create a corner. At the same time, we do not want false positives, which would result if our window was set too high.

Image->World coordinate mapping
Currently, this is still a manual process involving clicking each junction and selecting from a list of 3-D world points that have been previously read in from worldpoints.txt. While it would have been much more desirable to have automatic matching, there are many complexities behind this, which are detailed in the future work section, and we felt that this would detract us from our core focus on the calibration algorithm itself.

(excerpt from text window)

 46: (2.812500, 0.500000, -0.500000)  
47: (2.812500, 0.500000, -1.500000)  
Enter ID of world point to map to (-1 for no mapping): 30 
**** Mapped world point id 30 (0.500000, 0.500000, 2.750000) to image point 
(-32.771388, -7.026902) 

Figure: All the points on the calibration pattern box used in matching, along with their ID numbers, used during the manual process of specifying world-to-image point mappings.

4. Results.

For the bulk of our experiments, we ran our algorithm on a sequence of synthetic images, rotating the camera along different points. As detailed below, the results seem to be promising, in that rotation of the calibration pattern generally results in a corresponding change in R, while at the same time, pointing out the need for improvement. We briefly describe these results, and enumerate some possible causes of error, along with ways to address them in this section.

We expect rotation matrices to exhibit autocorrelation -- ie, they should not change too much from the previous value. This characteristic holds for intervals of consecutive images, but not the entire sequence: the computed matrices will abruptly "flip" at certain points in both sequences. As can be seen in the visualizations of the rotation matrix, they seem to belong to one of two "groups", switching back and forth between them. Note how the ordering of the color-coded axes abruptly changes in the visualizations of R -- for instance, the green Y' axis goes from pointing upward to pointing below the XZ plane.

Our other characteristics, such as our focal length f and our translation vector T also exhibited this bimodal distribution.

After analysis of the raw data, this is most likely due to the sign-reversal step (see Eqn. 6.12 of Trucco and Verri) occuring at some points, but not others. This reverses the signs of the first two components of T and the first two rows of R. Currently, we arbitrarily pick the first point we mapped as the point to apply the test of 6.12 to.

In theory, it should not matter which point we apply (6.12) to, as the signs should stay the same, but measurement error may cause this to not be true in practice. This hypothesis is further strengthed by the fact that "incomplete" mappings (ones where not all visible points were used, due to inability to find a function) would often lead to the sign reversal. A possible solution would be to apply the inequality of (6.12) over all mapped points, and see if more points are negative or positive.

Performance along oblique orientations

As a side of the box becomes more and more oblique, the points in the image appear closer and closer together. Because of our limited spatial resolution, the uncertainty in identifying and computing the location of junction points may introduce significant error.

Matching with incomplete points

There appears to be correlation with how many points we use in the matching, and the end result, in terms of which "cluster" the computed rotation matrix seem to fall under.

Noise sensitivity

For us, noise was not a significant issue in the acquired images we tested on, nor was it for the synthetic images. Thresholding the image helped. Noise could potentially have the most adverse impact on junction detection, however.

Assumptions about Perspective

From the 1/Z term in the equations, we are in fact, assuming a proper perspective camera model. This assumption, for images where the distances between matching points are small relative to the distance from the camera, may lead to high sensitivity in the 1/Z term to small perturbations, since Z is large.

5. Future Work

Automatic point matching

Establishing correspondences between world points and image points

The main time-consuming step for us, as we ran our experiments, was matching image points found to world points -- which points belong to what? Having an automatic point matching system would be a significant reduction in the effort needed to utilize the algorithm.

We have begun looking into preliminary problems in addressing this issue for our specific calibration pattern and its geometric structure.

Essentially, we would like to first determine, based on the slope of its associated line, where a point might be matched with. We assume that, since the depth of our image is much greater than the relative distance between any two salient points in our scene, our image is close to the perfect case of isometric projection -- that is, parallel lines on a line will stay parallel under our perspective projection. In practice, this ideal case doesn't hold, but it should be enough for our cases. There are three principal directions. We can find those slopes by collecting the slopes of the received lines and doing k-means clustering on this set, with k = 3.

We know that each junction point is associated with the intersection of two lines -- we note their slopes, and see which of the three slopes the lines of the point are associated with. We can then see which of the 3 clusters the slope falls under.

One issue is that the pattern on each side of the cube is the same. One possible solution in the future is have unique patterns for each side so that we can differentiate between them.

Other possible differentiation methods for points that might be investigated in the future include:

6. Conclusion

As it stands now, the system provides a way, given a mapping of image points to world points, to track small perturbations in the camera's location or orientation. The obvious bottleneck is the tedious mapping of image points to world points by hand. While this is acceptable for camera calibration which is likely only performed once, compelling applications such as image stablization seem to require automatic feature matching if they are to be feasible to the average user.

References

Trucco, Emanuele and Verri, Alessandro. Introductory Techniques for 3-D Computer Vision
Horatio http://www.ee.surrey.ac.uk/Personal/P.McLauchlan/horatio/html/index.html
K-means algorithm: http://www.ece.neu.edu/groups/rpl/kmeans/
CLAPACK: http://www.netlib.org/clapack/index.html
Tsai, R. Y. A Versatile Camera Calibration Technique for High Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses, IEEE Journal of Robotics and Automation Vol RA-3, No 4, pp. 323-344 (1987)


Appendix 1: Computed results

Rotation of the camera about a fixed target

Numerical data

Camera roll (bank) sequence

Numerical data


Appendix 2: Source code and data files

C source code and other necessary data files
Source code in C | worldpoints.txt | test.cfg
Notes: This needs the Horatio libraries and header files to compile, but it doesn't involve any modifications to the library itself. All three files should go in the horatio/src/lib/improc directory; the ip_test.c default test program is replaced.

Mark Wang <mwang@cs.stanford.edu>
vv