Scene completion demo

This is a toy implementation of Scene Completion Using Millions of Photographs for 6.870 Object Recognition and Scene Understanding.


NEW! MATLAB code for this implementation can be downloaded here.

You will also need Antonio Torralba's GIST descriptor code, available here.

The results shown below were obtained using a GIST-matching algorithm by Michael G. Ross. The implementation above is very similar, but gives slightly different nearest neighbors.


This implementation uses a database of 54,000 256x256 images from the Oliva lab for which we already have gist-based nearest neighbors data. Nearly all of these images are outdoor scenes (city streets, forests, mountains, etc.). For this demo, I took 7 test images and got their 10 nearest neighbors in gist space.

Test images with their 10 nearest neighbors

For these first two test images, some of the nearest neighbors are actually the same scene photographed from a slightly different angle.

The street and mountain are fairly common types of scenes in our database, so even though there were no exact matches, there are many very similar scenes.

These last three scenes are not very prototypical and not well represented in the database, so the nearest neighbors are not very good matches (although they still tend to have some elements in common with the test image).

For each test image, I did two scene completion tests. For this, I just cut out an object or region in the test image and replaced it with the corresponding region from a near neighbor. (The paper does a lot of extra steps to find the best alignment between the test image and the neighbor and blend the two images, which I skipped in this toy version.) Click on the images to see the results, or scroll down for some examples.

Results

The better the neighbor matches the original image, the better the result:

Neighbors that don't match the original tend to produce really bad results:

But sometimes the filled image looks pretty good, even though the fill is completely wrong:

(The left image shows the stairs filled in with part of a crosswalk, the right image shows some of the water filled in with grass.)

Conclusion

Even in this toy version, the results look promising. A large, representative database of images is important for getting good results. Aligning the original and fill image would also give better results, but you can do pretty well even without this step (since the near neighbors in gist space tend to be pretty well aligned anyways).