Monday, February 28, 2011

Paper Reading #12: Cosaliency: Where People Look When Comparing Images

Comments:
Vince Kocks
Wesley Konderla

Reference Information:
Title: Cosaliency: Where People Look When Comparing Images
Authors: David E. Jacobs, Dan B. Goldman, Eli Shechtman
Presentation Venue: UIST’10, October 3–6, 2010, New York, New York, USA
 

Summary:

Photographic triage is a problem people have when comparing photos. Which one should be kept and which ones should be deleted? There are certain factors that professional photographers look at during photo triage. These include image noise, motion blur, human poise, facial expressions, object orientation, parallax occlusions, disocclusions, and appearance. These factors are called local structure.


This paper discusses a learned model that is developed to calculate the importance of image pixels in context to other images. This feature is called cosaliency.

Before a model could be developed a user study was done using Amazon's Mechanical Turk. User's were asked to identify salient differences in a pair of similar photos. A map was created showing where user's thought was the most important. These generated maps are referred to as goal maps.

Several features are discussed on how to generate these maps without user interference. To validate these maps another study was done on Mechanical Turk. User's were asked to compare a set of photo crops and rank them based on utility used in the image triage task. The study showed that the generated maps from cosaliency to be better.

This work is limited to the pair of images to make testing practical. Photos in a real collection often appear in groups, not just pairs. The type of model created may not be appropriate for all kinds of photos. This framework is applicable for general photography. One thing to note is even though cosaliency can only be found for a pair of images, it was found to be more effective that working with a single image.

Discussion:

Before I comment on the paper I just want to point out that Mechanical Turk seems very popular for research. I read the paper about Amazon Mechanical Turk and ever since then many of my papers have included conducting research through it.

The idea of this paper should appeal greatly to professional photographers. The ability to quickly decide which photograph to keep and which to delete could save a lot of time.  I think the limitation to only 2 images is reasonable for the time being but I'm interested to see what kind of other processes are created to deal with larger comparisons. I also liked how 2 different studies were done to validate their model. 

The paper was a little confusing to follow but I think the images above (taken from the paper) will show best what the process does.

3 comments:

  1. I agree with you on the paper being technical but have to disagree with the use with profession photographers. The researcher even mention that this is more for amateurs because they just take pictures and lots of them. For professionals, they want more control over which images to throw away and will wait to look at them under full resolution even it that means carrying some extra sd cards with them. I actually proposed using this algorithm with constraints to do a pass to mark images for deletion, but not delete until a final say.

    ReplyDelete
  2. I think giving users more control over their photographs. A remember friends sometimes taking multiple pictures of the same thing. Having this system would allow you to know more easily what pictures to keep and which ones to trash. I would definitely consider using this.

    ReplyDelete
  3. I agree with the above two comments, and also I wonder if something could be done to improve images if the user takes several. If both pictures have problem areas could this system be made to identify and rectify that situation? Because I think too often photographs do not only have one area that is important to focus on.

    ReplyDelete