Automatic Scale Selection as a Pre-Processing Stage for Interpreting the Visual World

Tony Lindeberg

In: D. Chetverikov and T. Sziranyi (eds) Proc. Fundamental Structural Properties in Image and Pattern Analysis FSPIPA'99 (Budapest, Hungary), September 6-7, 1999. Schriftenreihen der Österreichischen Computer Gesellschaft, volume 130, pp 9--23.

Abstract

This paper reviews a systematic methodology for formulating mechanisms for automatic scale selection when performing feature detection in scale-space. An important property of the proposed approach is that the notion of scale is included already in the definition of image features.

Introduction

Computer vision algorithms for interpreting image data usually involve a feature detection step. The need for performing early feature detection is usually motivated by the desire of condensing the rich intensity pattern to a more compact representation for further processing. If a proper abstraction of shape primitives can be computed, certain invariance properties can also be expected with respect to changes in view direction and illumination variations.

The earliest works in this direction were concerned with the edge detection (Roberts 1965, Prewitt 1970). While edge detection may at first to be a rather simple task, it was empirically observed that it can be very hard to extract edge descriptors reliably. Usually, this was explained as a noise sensitivity that could be reduced by pre-smoothing the image data before applying the edge detector (Torre and Poggio 1980). Later, a deeper understanding was developed that these difficulties originate from the more fundamental aspect of image structure, namely that real-world objects (in contrast to idealized mathematical entities such as points and lines) usually consist of different types of structures at different scales (Witkin 1983, Koenderink 1984). Motivated by the multi-scale nature of real-world images, multi-scale representations such as pyramids (Burt and Adelson 1983), and scale-space representation (Witkin 1983, Koenderink 1984, Lindeberg 1994) were constructed. Theories were also formed concerning what types of image features should be extracted from any scale level in a multi-scale representation (Koenderink and van Doorn 1992, Florack et al 1992, Lindeberg 1994, Florack 1997).

The most common way of applying multi-scale representations in practice has been by selecting one or a few scale levels in advance, and then extracting image features at each scale level more or less independently. This approach can be sufficient under simplified conditions, where only a few natural scale levels are involved and provided that the image features a stable over large ranges of scales. Typically, this is the case when extracting edges of man-made objects viewed under controlled imaging conditions. In other cases, however, there may be a need for adapting scale levels individually to each image feature, or even to adapt the scale levels along an extended image feature, such as a connected edge. Typically, this occurs when detecting ridges (which turn out to be much more scale sensitive than edges) and when applying an edge detector to a diffuse edge for which the degree of diffuseness varies along the edge.

To handle these effects in general cases, we argue that it is natural to complement feature detection modules by explicit mechanisms for automatic scale selection, so as to automatically adapt the scale levels to the image features under study. The purpose of this article is to present such a framework for automatic scale selection, which is generally applicable to a rich variety of image features, and has been successfully tested by integration with other visual modules. For references to the original sources, see (Lindeberg 1998ab, Lindeberg 1999) and the references therein.

An attractive property of the proposed scale selection mechanism is that in addition to automatic tuning of the scale parameter, it induces the computation of natural abstractions (groupings) of image shape. In this respect, the proposed methodology constitutes a natural pre-processing stage for subsequent interpretation of visual scenes.

PDF: (2.9 Mb)

Responsible for this page: Tony Lindeberg