The image above shows the beginning of the interface that will allow a user to take a photo of a building, select the corners of the facade to use as a marker, rectify the image and apply a mask (to remove trees, people, etc), geotag the image by placing it on the map, and finally set its elevation (not yet shown)-- all with the nice touch interface on the iPad/iPhone. After this, the rectified image and its metadata will be sent to a server, where it will be processed as the training image for the ferns classifier. I'll have to draw up a diagram of this later. In the meantime, here's a picture I drew to rough out the idea of how this would work:
One thing this allows me to do is experiment with training images of different sizes and aspect ratios. Right now, everything gets squished into a 640x480 image (my video resolution). This means if I select a square region for the training image and try to find it in a scene, the homography it calculates must somehow represent anisotropic scaling (because in reality, the object to detect is square again, while the training image of it is 4:3). Well, it calculates the homography just fine, and when I multiply the image bounds by the homography directly to find their 2D coordinates, it draws the correctly, but when I decompose the homography matrix to get the OpenGL transform, it has an additional rotation added in. This is strange, and maybe means I'm calculating the OpenGL transformation matrix incorrectly (which might explain some weird results I was getting earlier...) Below is a picture of the issue.
|Cropping a roughly square region|
|White rectangle with a cross represents homography applied to 2D points. RGB coordinate system is drawn using the OpenGL transformation matrix. Note the offset in rotation. White homography looks correct...|