Abstract

In this work, we study local feature extraction methods and evaluate their performance in detecting local features from the salient regions of images. In order to measure the detectors' performance, we compared the detected regions to gaze fixations obtained from the eye movement recordings of human participants viewing two types of images: natural images (photographs) and abstract/surreal images. The results indicate that all of the six evaluated local feature detectors perform clearly above chance level. The Hessian-Affine detector performs the best and almost reaches the performance level of state-of-the-art saliency detection methods.

Citation

If you use images, data, matlab-scripts in your research, please include a citation to our paper draft.

@inproceedings{KinLaiOit:2013,
    author = 	{T. Kinnunen and M. Laine-Hernandez and P. Oittinen},
    title = 	{Evaluating local feature detectors in salient region detection},
    booktitle = {18th Scandinavian Conference on Image Analysis ({SCIA2013})},
    year =      {2013},
    address =   {Espoo, Finland}
}
  

Introduction

The Human Visual System (HVS) is capable of processing tremendous amounts of information, but only a fraction of it is important. Thus, the HVS functions so that the focus of visual attention can be moved quickly to detect important things. Visual attention is controlled by both top-down and bottom-up processes. Top-down processing is determined by high-level context factors, (e.g. task and the semantic contents of the scene). In bottom-up processing, visual attention shifts from one location to another based on low-level features which ``pop up'' from the scene. This ``pop up'' effect is called saliency (Itti&Koch, 2001) and it is based on the distinctiveness of an object from its surround regarding intensity, color and orientation. In this work, we study local feature extraction methods and evaluate their performance in detecting local features from the salient regions of images.

Experiments and results

Attention maps

Figure: Outputs of the saliency maps using local feature detectors. From the left: Original image, ground truth, Harris-Laplace, Harris-Affine, Hessian-Laplace, Hessian-Affine, MSER and SIFT (DoG). The top three images are from the Natural data set (Judd et al. 2009) and the bottom three images are from Abstract data set (Laine-Hernandez et al. 2012).

Saliency detection

Figure: Recall curves for "predicting" salient regions for Natural images (Judd et al. 2009)

Figure: Recall curves for "predicting" salient regions for Abstract images (Laine-Hernandez et al. 2012).

Downloads

Matlab codes etc are coming soon after the conference..

References

  1. T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In ICCV, 2009.
  2. M. Laine-Hernandez and T. Kinnunen and J.-K. Kamarainen and L. Lensu and H. Kälviäinen and P. Oittinen, Visual Saliency and Categorisation of Abstract Images, In ICPR, 2012.
  3. L. Itti and C. Koch. Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3):194–203, 2001.