tag:blogger.com,1999:blog-87994091203847355132024-03-14T02:39:47.487-07:00Urban Augmented Realitykronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-8799409120384735513.post-2954438376924736712011-05-23T03:08:00.000-07:002011-05-25T14:21:03.889-07:00Website + Documentation Draft<div class="separator" style="clear: both; text-align: left;">Working on the organization and layout for the toolkit website. Gave it a name, a look, and some bold words to describe the major functions.</div><div class="separator" style="clear: both; text-align: left;"><br />
</div><div class="separator" style="clear: both; text-align: left;">Of course, I haven't even finished the code yet. But I think tackling the challenge of how to organize the documentation at this stage will guide me in refactoring the code and making it as clear as possible as I finish writing the first release.</div><div class="separator" style="clear: both; text-align: center;"><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-4CJAKyNGaJg/TdouHUyoobI/AAAAAAAAAZU/Xy_u8dvMHXs/s1600/website-mockups-01.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="315" src="http://3.bp.blogspot.com/-4CJAKyNGaJg/TdouHUyoobI/AAAAAAAAAZU/Xy_u8dvMHXs/s400/website-mockups-01.png" width="400" /></a></td></tr>
</tbody></table><div><div><span class="Apple-style-span">The (home - download - docs - gallery - forum) navigation items come from a survey of the homepages for some of the toolkits that inspired this, such as <a href="http://www.processing.org/">Processing</a>, <a href="http://openframeworks.cc/">OpenFrameworks</a>, and <a href="http://libcinder.org/">Cinder</a>.<br />
</span><span class="Apple-style-span">I want a demo video to go on the right side. I guess I need to make that some time, too.</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-tcWifKC8P4E/Td1yq06YtYI/AAAAAAAAAZo/FTK7tPQ-Qr8/s1600/components-04.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="133" src="http://1.bp.blogspot.com/-tcWifKC8P4E/Td1yq06YtYI/AAAAAAAAAZo/FTK7tPQ-Qr8/s400/components-04.png" width="400" /></a></div><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-EgUYEjKVsqA/Tdo040SRsBI/AAAAAAAAAZk/f6-AQlzoBqk/s1600/website-mockups-02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="http://1.bp.blogspot.com/-EgUYEjKVsqA/Tdo040SRsBI/AAAAAAAAAZk/f6-AQlzoBqk/s400/website-mockups-02.png" width="363" /></a></div>The class overview page for "augment." Not as explicit as a UML diagram, not as verbose as a Javadoc or doxygen documentation, I'd like to present the most relevant information up front and hide the details until clicked upon. The cyan entries are the public interface-- either public methods or members with at least a getter/possibly a setter. The private stuff should be hidden by default. This could look really nice with some jQuery sliding menu magic.</div></div>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1tag:blogger.com,1999:blog-8799409120384735513.post-82694418588482007082011-05-21T22:47:00.000-07:002011-05-21T22:47:01.587-07:00Aspect Ratio of a Rectangle in PerspectiveOne of the things I left unimplemented in my image tagger input screen was recovering the aspect ratio of the selected rectangle. Before, it just squished everything into a 640x480 image. But now, thanks to <a href="http://research.microsoft.com/en-us/um/people/zhang/Papers/WhiteboardRectification.pdf">this paper</a>, I can automatically calculate the aspect ratio from a given set of four corners. The OpenCV implementation is below. Note the strange ordering of the rectangle's corners (M_i (i = 1...4), are (0, 0), (w, 0), (0, h), and (w, h) ).<br />
<span class="Apple-style-span" style="font-size: x-small;"><br />
</span><br />
<pre><span class="Apple-style-span" style="font-size: x-small;">// Get aspect ratio</span></pre><pre><span class="Apple-style-span" style="font-size: x-small;">// Input corners c0,c1,c2,c3 are given as a percent of the original image height/width
// Using equations from: http://research.microsoft.com/en-us/um/people/zhang/Papers/WhiteboardRectification.pdf
cv::Mat A = (cv::Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(3,3) << 786.42938232, 0, imageSize.width/2,
0, 786.42938232, imageSize.height/2,
0,0,1);
float k2, k3;
float ratio;
cv::Mat _ratio;
cv::Mat n2, n3;
cv::Mat m1 = (cv::Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(3,1) << imageSize.width * (float)c0.x, imageSize.height * (float)c0.y, 1);
cv::Mat m2 = (cv::Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(3,1) << imageSize.width * (float)c3.x, imageSize.height * (float)c3.y, 1);
cv::Mat m3 = (cv::Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(3,1) << imageSize.width * (float)c1.x, imageSize.height * (float)c1.y, 1);
cv::Mat m4 = (cv::Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(3,1) << imageSize.width * (float)c2.x, imageSize.height * (float)c2.y, 1);
k2 = (m1.cross(m4).dot(m3)) / ((m2.cross(m4)).dot(m3));
k3 = (m1.cross(m4).dot(m2)) / ((m3.cross(m4)).dot(m2));
n2 = (k2*m2) - m1;
n3 = (k3*m3) - m1;
_ratio = (n2.t()*(A.inv().t())*(A.inv())*n2) / (n3.t()*(A.inv().t())*(A.inv())*n3);
ratio = sqrt(_ratio.at</span><float><span class="Apple-style-span" style="font-size: x-small;">(0,0));</span>
</float></float></float></float></float></float></pre>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com0tag:blogger.com,1999:blog-8799409120384735513.post-61806574057429115522011-05-16T15:28:00.000-07:002011-05-16T15:30:12.482-07:00Offloading Processing to the Cloud!I've long realized that truly city-wide exploration with AR would require some sort of client-server infrastructure. If an app were to contain all the possible facade-markers to recognize, it would require a single monolithic download. The reality is, the most interesting augmentations are going to require a network connection anyway (because they may be user-generated, or reflect up-to-the-minute information), and downloading only the facade-markers that are nearby will limit the app's initial size. This also means an app set to work in one city can be expanded into another without needing a new program, just need data.<br />
<br />
Once there's a remote server in the mix, I realized I could use it to offload some of the image processing so the mobile device doesn't have to work so hard. This is especially important when I'm using Fern classifiers as they require a long training step (~1 minute on the device) that just isn't realistic in terms of user experience. So I wrote some server-side scripts to accept new facade images (obtained via an interface like <a href="http://urbanar.blogspot.com/2011/05/fusion-interface.html">the one I described earlier</a>), process them, store their data in a database, and spit out stored facades that are near a user's current location. The diagram of how it all works is below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-4JMjLbV-ZVs/TdGh-94VCLI/AAAAAAAAAZQ/6Ww0a_dfWoM/s1600/ferns-server.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="272" src="http://3.bp.blogspot.com/-4JMjLbV-ZVs/TdGh-94VCLI/AAAAAAAAAZQ/6Ww0a_dfWoM/s400/ferns-server.jpg" width="400" /></a></div><br />
A few fun things I'm trying out here:<br />
<br />
First, I'm using <a href="http://aws.amazon.com/ec2/">Amazon EC2</a> which is awesome because I get root access on a virtual server somewhere in cloud-land. It's a little strange to get set up and wrap your head around data-persistence issues (i.e. If you "terminate" a server, everything goes bye-bye, but to "stop" it seems ok...) and it took a while to get everything set up (basically I started with a blank Ubuntu install, needed to get and build OpenCV, install Apache/MySQL/PHP) but now I'm happily working from the command line on a machine that exists mainly as an IP address.<br />
<br />
Second, I'm writing the high-level API stuff in PHP because it's really easy to process HTTP requests, write out json, and talk to the MySQL server. But the low-level image processing and Fern classifier processing has to happen in C++ (I wanted to use OpenCV's Python interface, but it doesn't cover all the latest stuff, including Fern classifiers). So I have my PHP scripts call up the OpenCV C++ program using the <a href="http://php.net/manual/en/function.exec.php">exec()</a> command. Maybe this isn't an optimal arrangement, but it works just fine.<br />
<br />
Third, I wanted to do the Ferns processing asynchronously so that when a new facade image is uploaded, the user gets an immediate confirmation and can carry on their merry way without waiting for the processor to finish. This is achieved by writing a PHP script that acts as a daemon process, using a PEAR extension called <a href="http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php/">System::Daemon</a>. The daemon sits in a loop, checking the database every few seconds for any facade entries flagged as unprocessed. It then sends these images down to the processor script and updates the database when they are complete.<br />
<br />
An interesting note about Amazon EC2 is that I'm using their "micro" instance which is free for a year. As best I can tell, the amount of processing power allocated to me is equivalent to a single-core 1Ghz processor. Which is actually <i>less</i> than what I have on the iPad 2. So Ferns processing still takes a couple of minutes, but at least it's not burning the iPad's battery and blocking the user interface.<br />
<br />
Finally, you can check out <a href="https://github.com/kronick/Urban-Frame-Fusion">all the server code on GitHub</a>.kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1tag:blogger.com,1999:blog-8799409120384735513.post-42158209834341043892011-05-04T12:35:00.000-07:002011-05-04T12:35:25.824-07:00Sensor Fusion Video<div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='540' height='337' src='https://www.youtube.com/embed/hCVZ2TSFI-Y?feature=player_embedded' frameborder='0'></iframe></div><br />
<span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 18px;">Demo of sensor fusion running on an iPad 2. The front of this building has been preprocessed to serve as a visual marker. The iPad's camera detects the image to get an initial estimate of where the user is standing and how the iPad is oriented in space. After that, the camera and the gyros/accelerometer in the iPad work together to keep the overlay aligned, even when the building goes out of view or isn't detected by the vision algorithm.<br />
<br />
Right now it's not rending anything interesting-- the red-green-blue lines represent the x-y-z axis as calculated by the camera and sensors. The background grid is drawn as a large cube surrounding the user-- you can see the corners when the camera pans up and left. The white rectangle with the X in the center only shows up when the camera detects the building facade; as you can see, it isn't detecting the facade every frame, but it doesn't have to as the gyros provide plenty of readings to fill in between the camera estimates. As a result, the animation runs at a nice smooth 60fps.<br />
<br />
Pipeline as of now: FAST corner detector - Ferns keypoint classifier - RANSAC homography estimator - Kalman filter (with CoreMotion attitude matrix) - OpenGL Modelview matrix</span>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1tag:blogger.com,1999:blog-8799409120384735513.post-75384347213209665012011-05-02T01:46:00.000-07:002011-05-02T01:46:50.292-07:00Fusion + InterfaceSensor fusion is sort of hard to capture in images. I'll try to get some video up here some time soon. But it's working to some extent-- once getting a pose estimate from the camera, the device's gyros will take over on frames where the camera can't detect the object. As long as the device only rotates and does not translate (or translates very little relative to the distance between it and the object it's detecting, as is the case when looking at a building a few dozen meters away), the gyros keep the image registered nicely.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-livZFaILjDg/Tb5nKFdf5iI/AAAAAAAAAYE/VN08kQ6eR40/s1600/rectify-interface.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://3.bp.blogspot.com/-livZFaILjDg/Tb5nKFdf5iI/AAAAAAAAAYE/VN08kQ6eR40/s400/rectify-interface.png" width="400" /></a></div><br />
The image above shows the beginning of the interface that will allow a user to take a photo of a building, select the corners of the facade to use as a marker, rectify the image and apply a mask (to remove trees, people, etc), geotag the image by placing it on the map, and finally set its elevation (not yet shown)-- all with the nice touch interface on the iPad/iPhone. After this, the rectified image and its metadata will be sent to a server, where it will be processed as the training image for the ferns classifier. I'll have to draw up a diagram of this later. In the meantime, here's a picture I drew to rough out the idea of how this would work:<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-IhWMPvoIotQ/Tb5ro9PwjgI/AAAAAAAAAYI/lkJjcCJdkCM/s1600/interface-sketch.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="305" src="http://2.bp.blogspot.com/-IhWMPvoIotQ/Tb5ro9PwjgI/AAAAAAAAAYI/lkJjcCJdkCM/s400/interface-sketch.jpg" width="400" /></a></div><br />
One thing this allows me to do is experiment with training images of different sizes and aspect ratios. Right now, everything gets squished into a 640x480 image (my video resolution). This means if I select a square region for the training image and try to find it in a scene, the homography it calculates must somehow represent anisotropic scaling (because in reality, the object to detect is square again, while the training image of it is 4:3). Well, it calculates the homography just fine, and when I multiply the image bounds by the homography directly to find their 2D coordinates, it draws the correctly, but when I decompose the homography matrix to get the OpenGL transform, it has an additional rotation added in. This is strange, and maybe means I'm calculating the OpenGL transformation matrix incorrectly (which might explain some weird results I was getting earlier...) Below is a picture of the issue.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-d961xoI3Cqw/Tb5ueKUEO_I/AAAAAAAAAYQ/Vl9pB6ZoidI/s1600/square-crop.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://3.bp.blogspot.com/-d961xoI3Cqw/Tb5ueKUEO_I/AAAAAAAAAYQ/Vl9pB6ZoidI/s320/square-crop.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Cropping a roughly square region</td></tr>
</tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-UrFw2Q30PEw/Tb5uc4rRu5I/AAAAAAAAAYM/RxwrAwQQsZM/s1600/rotation-offset.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://1.bp.blogspot.com/-UrFw2Q30PEw/Tb5uc4rRu5I/AAAAAAAAAYM/RxwrAwQQsZM/s320/rotation-offset.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">White rectangle with a cross represents homography applied to 2D points. RGB coordinate system is drawn using the OpenGL transformation matrix. Note the offset in rotation. White homography looks correct...</td></tr>
</tbody></table>I know this has something to do with the assumption that the homography matrix <b>H</b> = <b>K</b> * [<b>R</b> | <b>T</b>] -- meaning a combination of the camera properties, a rotation, and a translation (i.e. no scaling that isn't just a result of translation in the z-axis). But beyond that... Not sure what to do about it just now. Maybe simply keeping all training images at the same aspect ratio, padded with black, is the way to go about this. We'll see...kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1tag:blogger.com,1999:blog-8799409120384735513.post-24708648479852158232011-04-20T00:07:00.000-07:002011-04-20T00:07:41.915-07:00Pose Estimation and Sensor Fusion<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-P2CWx_cD__g/Ta6DNCETcTI/AAAAAAAAAXs/5ZE7EDJSd_4/s1600/Pose+Estimation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://3.bp.blogspot.com/-P2CWx_cD__g/Ta6DNCETcTI/AAAAAAAAAXs/5ZE7EDJSd_4/s400/Pose+Estimation.png" width="400" /></a></div>The above diagram gives a high level overview of what my approach is to estimating the user's pose within an urban environment. I have so far been focusing on the computer vision side of things, trying to get a robust pose estimate from keypoint correspondences between what's seen through the camera's lens and a rectified image that serves as a natural marker. Getting this to work well on a mobile device has been quite a project in itself, and there are still plenty of challenges to solve there. But the past couple of weeks, I have been diving into the other sensors found on an iPad, iPhone, many of the top-of-the-line Android phones and tablets, and likely most portable media devices of the future: GPS, compass, gyroscopes, and accelerometers. These sensors can be combined to give a pose estimate as well (and this is extremely easy on a platform like iOS... the CoreMotion framework abstracts away a lot of the details, and I believe there is some rugged sensor fusion going on at the hardware level). Most of the existing "locative" augmented reality apps out there (like Layar, Wikitude, or Yelp Monocle) only use these sensors. This is problematic mainly because GPS does not give very precise or accurate position information, especially in an urban environment. GPS drifts, can be offset several meters due to multipath effects, and generally doesn't get you "close enough" to do a true pixel-perfect visual overlay onto the real world, so most apps that use sensor-based AR simply display floating information clouds and textual annotations rather than 3D graphics. Thus, I aim to combine vision- and sensor-based pose estimates for better results.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/C7JQ7Rpwn2k?feature=player_embedded' frameborder='0'></iframe></div>This video gives a nice overview of what the different sensors do and what they're each good and bad at.kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1tag:blogger.com,1999:blog-8799409120384735513.post-28828537179577660052011-04-08T00:45:00.000-07:002011-04-08T00:45:39.604-07:00Some sort of resultsOutdoors, recognizing a building facade, on an iPad, at a reasonable framerate, just like I always wanted. Though in the frame I screencaptured, things aren't registering properly just yet, but oh well. It was light outside then, but we all know the best work gets done after sunset.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-yvT6TPylwsc/TZ684upJ1tI/AAAAAAAAAXo/68sUjxfsQxQ/s1600/outdoors.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-yvT6TPylwsc/TZ684upJ1tI/AAAAAAAAAXo/68sUjxfsQxQ/s400/outdoors.jpg" width="400" /></a></div>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com3tag:blogger.com,1999:blog-8799409120384735513.post-45310489518073089422011-04-05T16:59:00.000-07:002011-04-08T00:34:33.611-07:00From Homography to OpenGL Modelview MatrixThis is the challenge of the week-- how do I get from a 3x3 homography matrix (which relates the plane of the source image to the plane found in the scene image) to an OpenGL modelview transformation matrix so I can start, you know, augmenting reality? The tricky thing is that while I can use the homography to project a 3D point onto the 2D image plane, I need separated rotation and translation vectors to feed OpenGL so it can set the location and orientation of the camera in the scene.<br />
<br />
The easy answer seemed to be using OpenCV's <a href="http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp">cv::solvePnP()</a> (or its C equivalent, <a href="http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html#findextrinsiccameraparams2">cvFindExtrinsicCameraParams2()</a>) by inputting four corners of the detected object calculated from the homography. But I'm getting weird memory errors with this function for some reason ("<span class="Apple-style-span" style="font-family: Menlo; font-size: 11px;"><b>incorrect checksum for freed object - object was probably modified after being freed.<br />
*** set a breakpoint in malloc_error_break to debug</b></span> but setting a breakpoint on malloc_error_break didn't really help, and it isn't an Objective-C object giving me trouble, so NSZombiesEnabled won't be any help, etc etc arghhh....) AND it looks like it's possible to decompose a homography matrix into rotation and translation vectors which is all I really need (as long as I have the camera intrinsic matrix, which I found in the <a href="http://urbanar.blogspot.com/2011/04/offline-camera-calibration-for.html">last post</a>). solvePnP looks useful if I wanted to do pose estimation from a 3D structure, but I'm sticking to planes for now as a first step. OpenCV's solvePnP() doesn't look like it has an option to use RANSAC which seems important if many points are likely to be outliers-- an assumption that the Ferns-based matcher relies upon.<br />
<br />
Now to figure out the homography decomposition... There are <a href="http://vision.ucla.edu//MASKS/MASKS-ch5.pdf">some equations here</a> and <a href="https://gist.github.com/740979/97f54a63eb5f61f8f2eb578d60eb44839556ff3f">some code here</a>. I wish this were built into OpenCV. I will update as I find out more.<br />
<br />
<b>Update 1:</b> The code found <a href="https://gist.github.com/740979/97f54a63eb5f61f8f2eb578d60eb44839556ff3f">here</a> was helpful. I translated it to C++ and used the OpenCV matrix libraries, so it required a little more work than a copy-and-paste. The 3x3 rotation matrix it produces is made up of the three orthogonal vectors that OpenGL wants (so they imply a rotation, but they're not three Euler angles or anything) which this image shows nicely:<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://www.songho.ca/opengl/files/gl_anglestoaxes01.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://www.songho.ca/opengl/files/gl_anglestoaxes01.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Breakdown of the OpenGL modelview matrix (<a href="http://www.songho.ca/opengl/gl_transform.html">via</a>)</td></tr>
</tbody></table><div>The translation vector seems to be translating correctly as I move the camera around, but I'm not sure how it's scaled. Values seem to be in the +/- 1.0 range, so maybe they are in screen widths? Certainly they aren't pixels. Maybe if I actually understood what was going on I'd know better... Well, time to set up OpenGL ES rendering and try this out.</div><div><br />
<b>Update 2:</b> Forgot for a minute that OpenGL's fixed pipeline requires two transformation matrices: a modelview matrix (which I figure out above, based on the camera's EXtrinsic properties) and a projection matrix (which is based on the camera's INtrinsic properties). <a href="http://old.uvr.gist.ac.kr/wlee/web/techReports/ar/Camera%20Models.html">These</a> <a href="http://www.hitl.washington.edu/artoolkit/mail-archive/message-thread-00653-Re--Questions-concering-.html">resources</a> <a href="http://sightations.wordpress.com/2010/08/03/simulating-calibrated-cameras-in-opengl/">might</a> <a href="http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262160">be</a> <a href="http://www.songho.ca/opengl/gl_projectionmatrix.html">helpful</a> in getting the projection matrix.<br />
<br />
<b>Update 3:</b> Ok, got it figured out. It's not pretty, but it works. I think I came across the same thing as <a href="http://stackoverflow.com/questions/3712049/how-to-use-an-opencv-rotation-and-translation-vector-with-opengl-es-in-android">this guy</a>. Basically I needed to switch the sign on <i>four out of nine</i> elements of the modelview rotation matrix and two of the three components of the translation vector. The magnitudes were correct, but it was rotating backwards in the z-axis and translating backwards in the x- and y- axes. This was extremely frustrating. So, I hope the code after the jump helps someone else out...<br />
<br />
<a name='more'></a><br />
<span class="Apple-style-span" style="font-size: x-small;"><br />
</span><br />
<pre><span class="Apple-style-span" style="font-size: x-small;">// Transform the camera's intrinsic parameters into an OpenGL camera matrix
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
// Camera parameters
double f_x = 786.42938232; // Focal length in x axis
double f_y = 786.42938232; // Focal length in y axis (usually the same?)
double c_x = 217.01358032; // Camera primary point x
double c_y = 311.25384521; // Camera primary point y
double screen_width = 480; // In pixels
double screen_height = 640; // In pixels
double fovY = 1/(f_x/screen_height * 2);
double aspectRatio = screen_width/screen_height * f_y/f_x;
double near = .1; // Near clipping distance
double far = 1000; // Far clipping distance
double frustum_height = near * fovY;
double frustum_width = frustum_height * aspectRatio;
double offset_x = (screen_width/2 - c_x)/screen_width * frustum_width * 2;
double offset_y = (screen_height/2 - c_y)/screen_height * frustum_height * 2;
// Build and apply the projection matrix
glFrustumf(-frustum_width - offset_x, frustum_width - offset_x, -frustum_height - offset_y, frustum_height - offset_y, near, far);</span>
</pre><br />
<br />
<pre><span class="Apple-style-span" style="font-size: x-small;">// Decompose the Homography into translation and rotation vectors
// Based on: https://gist.github.com/740979/97f54a63eb5f61f8f2eb578d60eb44839556ff3f
Mat inverseCameraMatrix = (Mat_</span><double><span class="Apple-style-span" style="font-size: x-small;">(3,3) << 1/cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) , 0 , -cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,2)/cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) ,
0 , 1/cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,1) , -cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,2)/cameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,1) ,
0 , 0 , 1);
// Column vectors of homography
Mat h1 = (Mat_</span><double><span class="Apple-style-span" style="font-size: x-small;">(3,1) << H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0));
Mat h2 = (Mat_</span><double><span class="Apple-style-span" style="font-size: x-small;">(3,1) << H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,1) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,1) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,1));
Mat h3 = (Mat_</span><double><span class="Apple-style-span" style="font-size: x-small;">(3,1) << H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,2) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,2) , H_matrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,2));
Mat inverseH1 = inverseCameraMatrix * h1;
// Calculate a length, for normalizing
double lambda = sqrt(h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0)*h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) +
h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0)*h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) +
h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0)*h1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0));
Mat rotationMatrix;
if(lambda != 0) {
lambda = 1/lambda;
// Normalize inverseCameraMatrix
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,1) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,1) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,1) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,2) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,2) *= lambda;
inverseCameraMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,2) *= lambda;
// Column vectors of rotation matrix
Mat r1 = inverseCameraMatrix * h1;
Mat r2 = inverseCameraMatrix * h2;
Mat r3 = r1.cross(r2); // Orthogonal to r1 and r2
// Put rotation columns into rotation matrix... with some unexplained sign changes
rotationMatrix = (Mat_</span><double><span class="Apple-style-span" style="font-size: x-small;">(3,3) << r1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) , -r2.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) , -r3.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) ,
-r1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) , r2.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) , r3.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) ,
-r1.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0) , r2.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0) , r3.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0));
// Translation vector T
translationVector = inverseCameraMatrix * h3;
translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0) *= 1;
translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0) *= -1;
translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0) *= -1;
SVD decomposed(rotationMatrix); // I don't really know what this does. But it works.
rotationMatrix = decomposed.u * decomposed.vt;
}
else {
printf("Lambda was 0...\n");
}
modelviewMatrix = (Mat_</span><float><span class="Apple-style-span" style="font-size: x-small;">(4,4) << rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,1), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,2), translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(0,0),
rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,1), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,2), translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(1,0),
rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,1), rotationMatrix.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,2), translationVector.at</span><double><span class="Apple-style-span" style="font-size: x-small;">(2,0),
0,0,0,1);</span>
</double></double></double></double></double></double></double></double></double></double></double></double></float></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></double></pre><br />
</div>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com13tag:blogger.com,1999:blog-8799409120384735513.post-869368849026969812011-04-04T15:42:00.000-07:002011-05-21T22:37:24.485-07:00Offline camera calibration for iPhone/iPad-- or any camera, reallyCreating a GUI to perform camera calibration on a mobile device like an iPhone or iPad sounded like more work than it would be worth, so I wrote a short program to do it offline. The cameras used on these devices can be assumed to be consistent within the same model, so it makes more sense for an app developer to have several precomputed calibration matrices available rather than asking the user to do this step on their own device.<br />
<br />
The program I wrote is adapted from a <a href="http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html">tutorial I found</a> to do camera calibration from live input. My version instead looks for a sequence of files name 00.jpg, 01.jpg, etc and calibrates from those. So the way I used it was to take several pictures of the checkerboard pattern from my iPad, upload them to my computer, edit out the rest of the stuff on my desktop in Photoshop so finding the corners was more likely to be correct, and rename them. The output of the program is two XML files which include the camera intrinsic parameters and distortion coefficients. The code for the program is attached after the jump.<br />
<br />
And results:<br />
<div class="separator" style="clear: both; text-align: center;">For camera Matrix<a href="http://opencv.willowgarage.com/documentation/cpp/_images/math/eb60bb737b4f92a1b93eb03558e99bf214336e09.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://opencv.willowgarage.com/documentation/cpp/_images/math/eb60bb737b4f92a1b93eb03558e99bf214336e09.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: left;"><i>f_x</i> = 786.42938232</div><div class="separator" style="clear: both; text-align: left;"><i>f_y</i> = 786.42938232</div><div class="separator" style="clear: both; text-align: left;"><i>c_x</i> = 311.25384521 // <i>See update below</i></div><div class="separator" style="clear: both; text-align: left;"><i>c_y</i> = 217.01358032 // <i>See update below</i></div><div class="separator" style="clear: both; text-align: left;"><br />
</div><div class="separator" style="clear: both; text-align: left;">And distortion coefficients were: -0.10786291, 1.23078966, -4.54779295e-03, -3.28966696e-03, -5.54199600</div><div class="separator" style="clear: both; text-align: left;"><br />
</div><div class="separator" style="clear: both; text-align: left;">I hope I did that right. The center is slightly off from where it ideally should be (320, 240).</div><br />
Note: I found this <a href="http://www.ient.rwth-aachen.de/cms/software/opencv/">precompiled private framework of OpenCV built for OSX</a> rather handy. It is only built with 32-bit support, so set your target in XCode accordingly.<br />
<br />
<b>UPDATE:</b> The primary point obtained from this calibration was wrong! It was throwing off the pose estimates at glancing angles. I set it to 320,240 and everything works better now...<br />
<br />
<a name='more'></a><br />
<span class="Apple-style-span" style="font-size: x-small;"><br />
</span><br />
<pre><span class="Apple-style-span" style="font-size: x-small;">// main.mm
// CameraCalibration
//
// Adapted from http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html
//
#import <cocoa cocoa.h>
#import <opencv opencv.h>
#import <opencv highgui.h>
int main(int argc, char *argv[])
{
const int NUMBER_OF_PICTURES = 23;
int board_w = 5; // Board width in squares
int board_h = 8; // Board height
int n_boards = NUMBER_OF_PICTURES; // Maximum number of boards
int board_n = board_w * board_h;
CvSize board_sz = cvSize( board_w, board_h );
cvNamedWindow( "Calibration", 1);
// Allocate Sotrage
CvMat* image_points = cvCreateMat( n_boards*board_n, 2, CV_32FC1 );
CvMat* object_points = cvCreateMat( n_boards*board_n, 3, CV_32FC1 );
CvMat* point_counts = cvCreateMat( n_boards, 1, CV_32SC1 );
CvMat* intrinsic_matrix = cvCreateMat( 3, 3, CV_32FC1 );
CvMat* distortion_coeffs = cvCreateMat( 5, 1, CV_32FC1 );
CvPoint2D32f* corners = new CvPoint2D32f[ board_n ];
int corner_count;
int successes = 0;
int step, frame = 0;
const CFIndex FILENAME_LEN = 2048;
char filename[FILENAME_LEN] = "";
// (this is the mac way to package application resources)
CFBundleRef mainBundle = CFBundleGetMainBundle();
assert (mainBundle);
CFURLRef image_url = CFBundleCopyResourceURL (mainBundle, CFSTR("00"), CFSTR("jpg"), NULL);
assert (image_url);
Boolean got_it = CFURLGetFileSystemRepresentation (image_url, true,
reinterpret_cast<uint8 *>(filename), FILENAME_LEN);
if (! got_it) abort ();
IplImage *image = cvLoadImage(filename, 1);
IplImage *gray_image = cvCreateImage( cvGetSize( image ), 8, 1 );
NSLog(@"Loaded image.");
// Capture Corner views loop until we've got n_boards
// succesful captures (all corners on the board are found)
int pictureNumber = 0;
while( successes < n_boards && pictureNumber < NUMBER_OF_PICTURES-1){
// Find chessboard corners:
int found = cvFindChessboardCorners( image, board_sz, corners,
&corner_count, CV_CALIB_CB_ADAPTIVE_THRESH | CV_CALIB_CB_FILTER_QUADS );
// Get subpixel accuracy on those corners
cvCvtColor( image, gray_image, CV_BGR2GRAY );
cvFindCornerSubPix( gray_image, corners, corner_count, cvSize( 11, 11 ),
cvSize( -1, -1 ), cvTermCriteria( CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 30, 0.1 ));
// Draw it
cvDrawChessboardCorners( image, board_sz, corners, corner_count, found );
cvShowImage( "Calibration", image );
// If we got a good board, add it to our data
if( corner_count == board_n ){
step = successes*board_n;
for( int i=step, j=0; j < board_n; ++i, ++j ){
CV_MAT_ELEM( *image_points, float, i, 0 ) = corners[j].x;
CV_MAT_ELEM( *image_points, float, i, 1 ) = corners[j].y;
CV_MAT_ELEM( *object_points, float, i, 0 ) = j/board_w;
CV_MAT_ELEM( *object_points, float, i, 1 ) = j%board_w;
CV_MAT_ELEM( *object_points, float, i, 2 ) = 0.0f;
}
CV_MAT_ELEM( *point_counts, int, successes, 0 ) = board_n;
successes++;
}
pictureNumber++;
NSLog(@"%i chessboards found in %i pictures", successes, pictureNumber);
CFStringRef number = CFStringCreateWithFormat(NULL, NULL, CFSTR("%02i"), pictureNumber);
CFURLRef image_url = CFBundleCopyResourceURL (mainBundle, (CFStringRef)number, CFSTR("jpg"), NULL);
assert (image_url);
CFURLGetFileSystemRepresentation (image_url, true, reinterpret_cast<uint8 *=""><span class="Apple-style-span" style="font-size: x-small;">(filename), FILENAME_LEN);
image = cvLoadImage(filename, 1); // Get next image
} // End collection while loop
// Allocate matrices according to how many chessboards found
CvMat* object_points2 = cvCreateMat( successes*board_n, 3, CV_32FC1 );
CvMat* image_points2 = cvCreateMat( successes*board_n, 2, CV_32FC1 );
CvMat* point_counts2 = cvCreateMat( successes, 1, CV_32SC1 );
// Transfer the points into the correct size matrices
for( int i = 0; i < successes*board_n; ++i ){
CV_MAT_ELEM( *image_points2, float, i, 0) = CV_MAT_ELEM( *image_points, float, i, 0 );
CV_MAT_ELEM( *image_points2, float, i, 1) = CV_MAT_ELEM( *image_points, float, i, 1 );
CV_MAT_ELEM( *object_points2, float, i, 0) = CV_MAT_ELEM( *object_points, float, i, 0 );
CV_MAT_ELEM( *object_points2, float, i, 1) = CV_MAT_ELEM( *object_points, float, i, 1 );
CV_MAT_ELEM( *object_points2, float, i, 2) = CV_MAT_ELEM( *object_points, float, i, 2 );
}
for( int i=0; i < successes; ++i ){
CV_MAT_ELEM( *point_counts2, int, i, 0 ) = CV_MAT_ELEM( *point_counts, int, i, 0 );
}
cvReleaseMat( &object_points );
cvReleaseMat( &image_points );
cvReleaseMat( &point_counts );
// At this point we have all the chessboard corners we need
// Initiliazie the intrinsic matrix such that the two focal lengths
// have a ratio of 1.0
CV_MAT_ELEM( *intrinsic_matrix, float, 0, 0 ) = 1.0;
CV_MAT_ELEM( *intrinsic_matrix, float, 1, 1 ) = 1.0;
// Calibrate the camera
cvCalibrateCamera2( object_points2, image_points2, point_counts2, cvGetSize( image ),
intrinsic_matrix, distortion_coeffs, NULL, NULL, CV_CALIB_FIX_ASPECT_RATIO );
// Save the intrinsics and distortions
cvSave( "Intrinsics.xml", intrinsic_matrix );
cvSave( "Distortion.xml", distortion_coeffs );
// Example of loading these matrices back in
CvMat *intrinsic = (CvMat*)cvLoad( "Intrinsics.xml" );
CvMat *distortion = (CvMat*)cvLoad( "Distortion.xml" );
// Build the undistort map that we will use for all subsequent frames
IplImage* mapx = cvCreateImage( cvGetSize( image ), IPL_DEPTH_32F, 1 );
IplImage* mapy = cvCreateImage( cvGetSize( image ), IPL_DEPTH_32F, 1 );
cvInitUndistortMap( intrinsic, distortion, mapx, mapy );
return 0;
}
</span></uint8></span></pre>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com2tag:blogger.com,1999:blog-8799409120384735513.post-74704343437288082422011-04-04T01:32:00.000-07:002011-04-04T01:42:08.328-07:00The Approach So FarThis project has been under development for a few months prior to the beginning of this blog, so I might as well explain some of the approach as it stands so far.<br />
<br />
There are three "big picture" technical components to this project: The first is developing an efficient markerless camera-based AR system consisting of a keypoint detector, feature matcher and pose estimator on the mobile platform. Second, is sensor fusion with the other sensors available on a modern mobile device-- compass, GPS, gyroscope, and accelerometer. Third, is user interface design integrating these technologies into an easy-to-use app that can both build view augmented data as well as provide new user-generated data to grow the database.<br />
<br />
So far, I have focused on building the general-purpose markerless AR system. I am using <a href="http://mi.eng.cam.ac.uk/~er258/work/fast.html">FAST Corner Detection</a> to find keypoints in an image, followed by the <a href="http://cvlab.epfl.ch/alumni/oezuysal/ferns.html">Ferns classifier</a> to match the keypoints as seen through the camera with those in a reference image, and then using <a href="http://en.wikipedia.org/wiki/RANSAC">RANSAC</a> to calculate the homography mapping the reference image to the camera image. All the algorithms at this point are built into OpenCV, which I have compiled for iOS with some <a href="http://www.atinfinity.info/wiki/index.php?OpenCV/Using%20OpenCV%202.2%20on%20iOS%20SDK%204.2">outside help</a>.<br />
<br />
(Speaking of iOS, I have tested this code both on an iPhone 4 and an iPad 2. The iPad 2 is significantly faster even without specific multithreaded programming techniques to take advantage of the dual core processor. I'm not exactly yet sure why this is, but maybe discovering why would reveal some unexpected bottlenecks in my code...)<br />
<br />
The immediate next step is to convert the homography I find into an OpenGL modelview transformation matrix so I can start rendering something more interesting than a rectangle over my scene. Though rectangles are nice and satisfying after fighting compilation errors for days, and seeing new rectangles drawn at ~12fps is great after a few weeks trying out SURF descriptors. And even though I have something that remotely functions, there is plenty of room for optimization, especially in terms of compressing the Ferns so they don't hog so much memory. The ideas in <a href="http://cvlab.epfl.ch/~calonder/CalonderLFKMB09.pdf">this paper</a> look like they might be half-implemented in the OpenCV Ferns code, though they are commented out.<br />
<br />
More details as things progress...kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com0tag:blogger.com,1999:blog-8799409120384735513.post-17046172464974667082011-03-28T01:39:00.000-07:002011-03-28T01:39:42.748-07:00HelloThe goal of this project is to develop a platform for mobile augmented reality (AR) applications that enable a user to explore and manipulate the image of an urban environment in real time. Most vision-based AR tools that exist today (including ARToolkit and Qualcomm's AR SDK) assume that the objects to be tracked are not stationary and therefore no external frame of reference is used to guide object recognition and tracking. If, however, the objects to recognize are facades of buildings and signs in a city, we can use sensor data (GPS/compass/etc) to roughly estimate our position and pose (as is done with Layar) and use vision techniques to more precisely align a 3D data overlay.<br />
<div><br />
</div><div>The ultimate goal is to produce a set of tools packaged as a unified toolkit that can prove useful to the growing community of "creative coders" working in the fields of art and design.</div><div><br />
</div><div>Check out the <a href="http://newuntitledpage.com/cse190a/proposal.pdf">project proposal</a> for more.</div>kronickhttp://www.blogger.com/profile/08461863633803037226noreply@blogger.com1