Scale-space representation: Definition and basic ideas

Figure 1: A multi-scale representation of a signal is an ordered set of derived signals intended to represent the original signal at different levels of scale.

Scale-space theory is a framework for early visual operations, which has been developed by the computer vision community (in particular by Witkin [], Koenderink [], Yuille and Poggio [], Lindeberg [] and Florack []) to handle the above-mentioned multi-scale nature of image data. A main argument behind its construction is that if no prior information is available about what are the appropriate scales for a given data set, then the only reasonable approach for an uncommitted vision system is to represent the input data at multiple scales. This means that the original signal should be embedded into a one-parameter family of derived signals, in which fine-scale structures are successively suppressed (see figure 1). How should such an idea be carried out in practice? A crucial requirement is that structures at coarse scales in the multi-scale representation should constitute simplifications of corresponding structures at finer scales--they should not be accidental phenomena created by the method for suppressing fine-scale structures. This idea has been formalized in a variety of ways by different authors. A noteworthy coincidence is that similar conclusions can be obtained from several different starting points. A main result is that if rather general conditions are imposed on the types of computations that are to be performed, then convolution by the Gaussian kernel and its derivatives is singled out as a canonical class of smoothing transformations. The requirements (scale-space axioms) that specify the uniqueness are essentially linearity and spatial shift invariance, combined with different ways of formalizing the notion that new structures should not be created in the transformation from fine to coarse scales. In summary, for any N-dimensional signal , its scale-space representation is defined by

where denotes the Gaussian kernel

and the variance t of this kernel is referred to as the scale parameter. Equivalently, the scale-space family can be obtained as the solution to the (linear) diffusion equation

with initial condition . Then, based on this representation, scale-space derivatives at any scale t are defined by

Figure 2: (a) The main idea of a scale-space representation is to generate a one-parameter family of derived signals in which the fine-scale information is successively suppressed. This figure shows a signal which has been successively smoothed by convolution with Gaussian kernels of increasing width. (b) Since new zero-crossings cannot be created by the diffusion equation in the one-dimensional case, the trajectories of zero-crossings in scale-space (here, zero-crossings of the second derivative) form paths across scales that are never closed from below.

Figure 3: Different levels in the scale-space representation of a two-dimensional image at scale levels t = 0, 2, 8, 32, 128 and 512 together with grey-level blobs indicating local minima at each scale.

Figure 2(a) shows the result of applying Gaussian smoothing to a one-dimensional signal in this way. Notice how this successive smoothing captures the intuitive notion of fine-scale information being suppressed, and the signals becoming successively smoother. Figure 3 gives a corresponding example for a two-dimensional image. Here, to emphasize the local variations in the grey-level landscape, local minima in the grey-level images at each scale have been indicated by dark blobs (grey-level blobs with spatial extent determined from a certain watershed analogy, which essentially describes how large a region associated with a local minimum can be filled with water, without water flooding over to regions associated with other local minima). As can be seen, mainly small blobs due to noise and texture are detected at fine scales. After a small amount of smoothing, the buttons on the keyboard manifest themselves as distinct minima, whereas at even coarser scales they merge to one unit (the keyboard). Also other dominant dark image structures (such as the calculator, the cord and the receiver) appear as single blobs at coarser scales. This example gives one illustration of the types of hierarchical shape decompositions that can be obtained by varying the scale parameter in the scale-space representation. The relations between image structures at different scales induced in this way is referred to as deep structure [, ].

Next: Axiomatic scale-space formulations Up: Scale-space: A framework for Previous: The need for multi-scale

Tony Lindeberg
Tue Jul 1 14:57:47 MET DST 1997