Reconstructing the World With Flickr

As of September 2010, Flicker hosted 5 billion images with 3,000 uploaded each minute.  Facebook hosts even more.

These massive photo sharing systems provide an ever growing imagery dataset that could be used to reconstruct real world scenes in 3D.  In recent years, researchers have been processing images from these sites using computer vision techniques to reconstruct real world scenes in 3D.


Recent progress in this area can be traced to when Microsoft, in cooperation with the University of Washington, publicly released Photosynth.  This application processes images and creates a 3D model and point cloud from a user’s uploaded images.   The user experience is akin to viewing a slideshow of photos placed in their proper spatial context.


Photosynth introductory video:


Microsoft researcher Blaise Aguera y Arcas’ Photosynth demo at TED 2007:


Reconstruction From Web Photo Collections

With Photosynth, individual users upload their own photos and are encouraged to follow shooting guidelines to create the best results.  The researchers behind Photosynth later extended their previous work by developing a system that used unstructured sets of photos uploaded by thousands of different users.  A series of computer vision techniques were used to analyze the millions of Flickr photos of major tourist destinations such as Rome and Venice.   Ultimately, they were able to reconstruct 3D point clouds of the Coliseum, the Trevi fountain and other frequently photographed European landmarks.

For more info on this project, go here: Building Rome in a Day.

In 2010, researchers at the University of North Carolina at Chapel Hill released a paper describing a technique for extracting 3D models from community photo databases which was significantly faster than other methods.  Their approach can analyze 3 million Flickr images of Rome and reconstruct the major landmarks in less than 24 hours on a single PC.  They also applied the technique to photos taken of Berlin.  If video footage of a location is available, their technique can reconstruct 3D models even faster

Project demo video:


What’s Next?

Work in this area will accelerate in coming years as smartphone ownership increases along with improvements in the imaging and geolocation capabilities of these devices.  For example, phones are now being equipped with gyroscopes and compasses in addition to GPS, providing richer data for 3D reconstruction algorithms.  Bubbli is a stealth start up that is apparently seeking to work in this area by coupling smart phone image and location data with computer vision algorithms. They will be interesting company to watch as more of their product plans are revealed.

These projects are laying the groundwork for the emergence of 3D online maps which can be continually updated by millions of users uploading geo-located photos and videos from around the world. We are approaching a point where the combination of digital imaging, computer vision, and location sensor technology enables us to capture entire 3D scenes rather than just photographs.  Ultimately, users will be able to experience a form of virtual time travel where they can see a location from various viewpoints at different points time.


Jan-Michael Frahm, Pierre Georgel, David Gallup, Tim Johnson, Rahul Raguram, Changchang Wu, Yi-Hung Jen, Enrique Dunn, Brian Clipp, Svetlana Lazebnik, Marc Pollefeys, “Building Rome on a Cloudless Day,” ECCV 2010. (view)

Michael Goesele, Jens Ackermann, Simon Fuhrmann, Ronny Klowsky, Fabian Langguth, Patrick Mücke, Martin Ritz, “Scene Reconstruction from Community Photo Collections,” Computer, vol. 43, no. 6, pp. 48-53, June 2010. (IEEE link)

Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski, “Building Rome in a Day” International Conference on Computer Vision, 2009, Kyoto, Japan. (view)


  1. […] This post was mentioned on Twitter by Jarrell Pair, Richard. Richard said: Read on: Reconstructing the World With Flickr: As of September 2010, Flicker hosted 5 billion images with 3,000 … […]

  2. 0rison says:

    “Digitizing the world” can be crowdsourced in complementary ways.

    Simply plotting the GPS locations from which photos are taken creates surprisingly detailed maps:

    Taking the signal-to-noise ratios between different GPS satellite signals permits the estimation of occluding buildings:

Leave a Reply