Urban Augmented Reality: From Homography to OpenGL Modelview Matrix

This is the challenge of the week-- how do I get from a 3x3 homography matrix (which relates the plane of the source image to the plane found in the scene image) to an OpenGL modelview transformation matrix so I can start, you know, augmenting reality? The tricky thing is that while I can use the homography to project a 3D point onto the 2D image plane, I need separated rotation and translation vectors to feed OpenGL so it can set the location and orientation of the camera in the scene.

The easy answer seemed to be using OpenCV's cv::solvePnP() (or its C equivalent, cvFindExtrinsicCameraParams2()) by inputting four corners of the detected object calculated from the homography. But I'm getting weird memory errors with this function for some reason ("incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug but setting a breakpoint on malloc_error_break didn't really help, and it isn't an Objective-C object giving me trouble, so NSZombiesEnabled won't be any help, etc etc arghhh....) AND it looks like it's possible to decompose a homography matrix into rotation and translation vectors which is all I really need (as long as I have the camera intrinsic matrix, which I found in the last post). solvePnP looks useful if I wanted to do pose estimation from a 3D structure, but I'm sticking to planes for now as a first step. OpenCV's solvePnP() doesn't look like it has an option to use RANSAC which seems important if many points are likely to be outliers-- an assumption that the Ferns-based matcher relies upon.

Now to figure out the homography decomposition... There are some equations here and some code here. I wish this were built into OpenCV. I will update as I find out more.

Update 1: The code found here was helpful. I translated it to C++ and used the OpenCV matrix libraries, so it required a little more work than a copy-and-paste. The 3x3 rotation matrix it produces is made up of the three orthogonal vectors that OpenGL wants (so they imply a rotation, but they're not three Euler angles or anything) which this image shows nicely:

Breakdown of the OpenGL modelview matrix (via)

The translation vector seems to be translating correctly as I move the camera around, but I'm not sure how it's scaled. Values seem to be in the +/- 1.0 range, so maybe they are in screen widths? Certainly they aren't pixels. Maybe if I actually understood what was going on I'd know better... Well, time to set up OpenGL ES rendering and try this out.

Update 2: Forgot for a minute that OpenGL's fixed pipeline requires two transformation matrices: a modelview matrix (which I figure out above, based on the camera's EXtrinsic properties) and a projection matrix (which is based on the camera's INtrinsic properties). These resources might be helpful in getting the projection matrix.

Update 3: Ok, got it figured out. It's not pretty, but it works. I think I came across the same thing as this guy. Basically I needed to switch the sign on four out of nine elements of the modelview rotation matrix and two of the three components of the translation vector. The magnitudes were correct, but it was rotating backwards in the z-axis and translating backwards in the x- and y- axes. This was extremely frustrating. So, I hope the code after the jump helps someone else out...

// Transform the camera's intrinsic parameters into an OpenGL camera matrix
glMatrixMode(GL_PROJECTION);
glLoadIdentity();

// Camera parameters
double f_x = 786.42938232; // Focal length in x axis
double f_y = 786.42938232; // Focal length in y axis (usually the same?)
double c_x = 217.01358032; // Camera primary point x
double c_y = 311.25384521; // Camera primary point y

double screen_width = 480; // In pixels
double screen_height = 640; // In pixels

double fovY = 1/(f_x/screen_height * 2);
double aspectRatio = screen_width/screen_height * f_y/f_x;
double near = .1;  // Near clipping distance
double far = 1000;  // Far clipping distance
double frustum_height = near * fovY;
double frustum_width = frustum_height * aspectRatio;

double offset_x = (screen_width/2 - c_x)/screen_width * frustum_width * 2;
double offset_y = (screen_height/2 - c_y)/screen_height * frustum_height * 2;

// Build and apply the projection matrix
glFrustumf(-frustum_width - offset_x, frustum_width - offset_x, -frustum_height - offset_y, frustum_height - offset_y, near, far);

// Decompose the Homography into translation and rotation vectors
// Based on: https://gist.github.com/740979/97f54a63eb5f61f8f2eb578d60eb44839556ff3f

Mat inverseCameraMatrix = (Mat_(3,3) << 1/cameraMatrix.at(0,0) , 0 , -cameraMatrix.at(0,2)/cameraMatrix.at(0,0) ,
            0 , 1/cameraMatrix.at(1,1) , -cameraMatrix.at(1,2)/cameraMatrix.at(1,1) ,
            0 , 0 , 1);
// Column vectors of homography
Mat h1 = (Mat_(3,1) << H_matrix.at(0,0) , H_matrix.at(1,0) , H_matrix.at(2,0));
Mat h2 = (Mat_(3,1) << H_matrix.at(0,1) , H_matrix.at(1,1) , H_matrix.at(2,1));
Mat h3 = (Mat_(3,1) << H_matrix.at(0,2) , H_matrix.at(1,2) , H_matrix.at(2,2));

Mat inverseH1 = inverseCameraMatrix * h1;
// Calculate a length, for normalizing
double lambda = sqrt(h1.at(0,0)*h1.at(0,0) +
      h1.at(1,0)*h1.at(1,0) +
      h1.at(2,0)*h1.at(2,0));


Mat rotationMatrix; 

if(lambda != 0) {
 lambda = 1/lambda;
 // Normalize inverseCameraMatrix
 inverseCameraMatrix.at(0,0) *= lambda;
 inverseCameraMatrix.at(1,0) *= lambda;
 inverseCameraMatrix.at(2,0) *= lambda;
 inverseCameraMatrix.at(0,1) *= lambda;
 inverseCameraMatrix.at(1,1) *= lambda;
 inverseCameraMatrix.at(2,1) *= lambda;
 inverseCameraMatrix.at(0,2) *= lambda;
 inverseCameraMatrix.at(1,2) *= lambda;
 inverseCameraMatrix.at(2,2) *= lambda;
 
 // Column vectors of rotation matrix
 Mat r1 = inverseCameraMatrix * h1;
 Mat r2 = inverseCameraMatrix * h2;
 Mat r3 = r1.cross(r2);    // Orthogonal to r1 and r2
  
 // Put rotation columns into rotation matrix... with some unexplained sign changes
 rotationMatrix = (Mat_(3,3) <<  r1.at(0,0) , -r2.at(0,0) , -r3.at(0,0) ,
            -r1.at(1,0) , r2.at(1,0) , r3.at(1,0) ,
            -r1.at(2,0) , r2.at(2,0) , r3.at(2,0));
 
 // Translation vector T
 translationVector = inverseCameraMatrix * h3;
 translationVector.at(0,0) *= 1;
 translationVector.at(1,0) *= -1;
 translationVector.at(2,0) *= -1;
 
 SVD decomposed(rotationMatrix); // I don't really know what this does. But it works.
 rotationMatrix = decomposed.u * decomposed.vt;
 
}
else {
 printf("Lambda was 0...\n");
}

modelviewMatrix = (Mat_(4,4) << rotationMatrix.at(0,0), rotationMatrix.at(0,1), rotationMatrix.at(0,2), translationVector.at(0,0),
          rotationMatrix.at(1,0), rotationMatrix.at(1,1), rotationMatrix.at(1,2), translationVector.at(1,0),
          rotationMatrix.at(2,0), rotationMatrix.at(2,1), rotationMatrix.at(2,2), translationVector.at(2,0),
          0,0,0,1);

13 comments:

UnknownApril 13, 2011 at 7:25 AM
Hi Sam,

Serge mentioned you were working on this -- Perhaps this reference will help:

http://cvrr.ucsd.edu/publications/2008/MurphyChutorian_Trivedi_CVGPU08.pdf

It shows the transformation from an intrinsic matrix to the proper ModelView and Projection matrices for OpenGL. (I wrote this when I was working on an augmented reality tracker)
AnonymousApril 13, 2011 at 4:03 PM
Also, don't forget that you can compute these matrices in code, and then simply load them directly with glLoadMatrix instead of calling glFrustumf.

http://www.opengl.org/sdk/docs/man/xhtml/glLoadMatrix.xml
UnknownMay 5, 2011 at 1:38 AM
This comment has been removed by the author.
YarragoJune 23, 2011 at 5:35 PM
I was just reading though your code and happened to notice a potential bug. You calculate inverseH1, but then you still use h1 to calculate lambda.

Mat inverseH1 = inverseCameraMatrix * h1;
// Calculate a length, for normalizing
double lambda = sqrt(inverseH1.at(0,0)*inverseH1.at(0,0) +
inverseH1.at(1,0)*inverseH1.at(1,0) +
inverseH1.at(2,0)*inverseH1.at(2,0));

Also you mention that your not really sure why you use the SVD, there is a good reference to this algorithm in Learning OpenCV on page 391.
AnonymousJuly 6, 2011 at 3:10 AM
Hi kronick,

Thanks for your posts - just wondering if you (or anyone else) has figured out the cvFindExtrinsicCameraParams2() issue or implemented an alternative method (have quickly reviewed your code above but unsure if this obtains the pose? - sorry, probably missing something).

Regarding the cvFindExtrinsicCameraParams2() method; the issue appears to be with the iPhone 4 (works for 3GS) and if you convert the cvMat to floats (CV_32) rather than doubles (CV_64) then it appears to work but the calculation overflows the floats sometimes giving you some strange results.

Appreciate your/anyones help.
ehuboJuly 8, 2011 at 7:03 AM
Hi,

Thanks for your post.
Concerning this:
SVD decomposed(rotationMatrix); // I don't really know what this does. But it works.
rotationMatrix = decomposed.u * decomposed.vt;

See A Flexible New Technique for Camera Calibration:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.6725&rep=rep1&type=pdf

appendix C.

kind regards,

Erik
DivoJuly 13, 2011 at 2:39 AM
Hi! what is meant by the cameraMatrix? is just the camera's INtrinsic properties (matrix A) or the projection matrix (matrix P)?
kronickJuly 13, 2011 at 5:40 AM
@Divo: Yep, the intrinsic properties (matrix A).
DivoJuly 14, 2011 at 2:58 AM
thanks! in some literature the matrix A is denoted as matrix K (for example Multiple View Geometry
Richard Hartley and Andrew Zisserman). there is no difference?
tomOctober 30, 2011 at 1:50 PM
Hi there and thank you for the resources and implementation.

Could anybody please post a homography matrix and the corresponding 4x4 matrix, so I can validate my approach. That would be very helpful.
walidFebruary 13, 2012 at 1:06 AM
hi all
any one can tell me how to find homography matrix after camera calibration ?
UnknownMay 28, 2012 at 6:58 AM
There is solvePnPRansac function in opencv

Tuesday, April 5, 2011

From Homography to OpenGL Modelview Matrix

13 comments: