As per my last post on this effort, http://ryan.fish/blog/characterizing-the-ps3-eye/, with the camera field of view parameters determined, and the lensing warp shown to be fairly low, there’s a straightforward path to taking an array of 2D points in the camera view, with known real-world coordinates, and backprojecting them to determine the camera’s position in space. Though this has probably been done a million times before by every vision system ever, it seemed like the kind of thing that should be easy! Or, as is often the case, would turn out to be interesting on its own and therefore worth the experience. 😀

Given my ultimate goal of extracting my swimming robot’s coordinates from a dual-camera setup, I need to know both cameras’ poses in the global reference frame to make a senseable coordinate extraction. So, with a static fishtank in the frame, containing the robot as well as generating perfect markers for a rectangular coordinate frame, I set about doing lots of trig. Essentially, the corners of the fishtank become a set of points in ℝ^{3}, with known coordinates since I can measure the fishtank. It’s a perfect rectangular prism, so it makes sense to align the global coordinate frame with its axes. Given 4 points on the fishtank, I can triangulate the camera location (and likely orientation though I haven’t thought that through yet and don’t need it).

Basic Method:

In OpenCV, click my 4 registration points on the 2D camera feed.

Given the pixel distance -> angle conversion I talked about in the previous post on the subject, convert every combination of 2 points to an angular measure.

Since the actual 3D position of all the points is known, the distance between any set of 2 is also known.

Taking each pair and considering the “point” location of the camera, it is clear there is a triangle for every point pair with the camera location as its third point.

The known distance between the two points on the fishtank is then opposite the angle determined by the pixel distance of the point pairs.

This should sound exactly like its heading towards the Law of Cosines to determine the other sides of a triangle with one angle known.

Since all of these imaginary triangles actually share sides with each other, a simple algebraic relationship exists to find all side lengths of all triangles from the known sides and the angles (these side lengths can also be interpreted as the distance from fishtank points to camera point).

To get back to what we actually want, the camera position in 3D, we can replace side lengths as values with side lengths as a function of the camera position. Instead of getting the lengths of every side, we do a little more algebra and get back from the solver the point that satisfies the distances.

If the world were perfect, we could stop there and call it a day. Just take the angles from the camera and the known line lengths from the fishtank points, run it through solver and boom, a point in ℝ^{3}. If we were working with exact values, this could happen, but a lot of approximations have happened thus far (the actual dimensions of the fishtank, the selected points from the video feed, the assumption of a perfect pixel to angle conversion, among others).

So instead we are left with having to find the camera point that minimizes the error between the computer world and the real world. We can start from a test point in space, check how badly it fits our equations by comparing the triangles it makes with the edge lengths and angles we know, then try new points that make this error smaller. Its essentially a “hot or cold” search where you drag a point through space, getting constant feedback of “hotter” or “colder”.

Fortunately, this works great!

Some notes on this:

I first did a demo in MATLAB, since the visualization tools there are a little easier to use. There I discovered a few flaws, the biggest being that only optimizing for the point is globally stable, optimizing the distances of the camera to each point has some local minima that can trap the solver.

The second is that a non-linear optimizer like `fmincon`

in MATLAB is really a math-package’s bread and butter, and they are not letting anyone peak under the hood. `fmincon`

is /not/ available to the MATLAB Coder c-code generation utility. Bummer.

However, Free Software was to the rescue, with Python‘s SciPy package containing the desired non-linear optimization suite. minimize can take a scalar function of multiple variables and scrobble the inputs using various methods to find a minimum of the function. It’s really beautiful that tools this good are in the public domain.

Pingback: Characterizing the PS3 Eye | The Caffeinated Fish