Urban Augmented Reality: Fusion + Interface

Sensor fusion is sort of hard to capture in images. I'll try to get some video up here some time soon. But it's working to some extent-- once getting a pose estimate from the camera, the device's gyros will take over on frames where the camera can't detect the object. As long as the device only rotates and does not translate (or translates very little relative to the distance between it and the object it's detecting, as is the case when looking at a building a few dozen meters away), the gyros keep the image registered nicely.

The image above shows the beginning of the interface that will allow a user to take a photo of a building, select the corners of the facade to use as a marker, rectify the image and apply a mask (to remove trees, people, etc), geotag the image by placing it on the map, and finally set its elevation (not yet shown)-- all with the nice touch interface on the iPad/iPhone. After this, the rectified image and its metadata will be sent to a server, where it will be processed as the training image for the ferns classifier. I'll have to draw up a diagram of this later. In the meantime, here's a picture I drew to rough out the idea of how this would work:

One thing this allows me to do is experiment with training images of different sizes and aspect ratios. Right now, everything gets squished into a 640x480 image (my video resolution). This means if I select a square region for the training image and try to find it in a scene, the homography it calculates must somehow represent anisotropic scaling (because in reality, the object to detect is square again, while the training image of it is 4:3). Well, it calculates the homography just fine, and when I multiply the image bounds by the homography directly to find their 2D coordinates, it draws the correctly, but when I decompose the homography matrix to get the OpenGL transform, it has an additional rotation added in. This is strange, and maybe means I'm calculating the OpenGL transformation matrix incorrectly (which might explain some weird results I was getting earlier...) Below is a picture of the issue.

Cropping a roughly square region

White rectangle with a cross represents homography applied to 2D points. RGB coordinate system is drawn using the OpenGL transformation matrix. Note the offset in rotation. White homography looks correct...

I know this has something to do with the assumption that the homography matrix H = K * [R | T] -- meaning a combination of the camera properties, a rotation, and a translation (i.e. no scaling that isn't just a result of translation in the z-axis). But beyond that... Not sure what to do about it just now. Maybe simply keeping all training images at the same aspect ratio, padded with black, is the way to go about this. We'll see...

Urban Augmented Reality

Monday, May 2, 2011

Fusion + Interface

1 comment: