Invariant receptive fields under natural image transformationsWhen a visual agent observes three-dimensional objects in the world by a two-dimensional light sensor (retina), the image data will be subject to basic image transformations in terms of:
We have developed a general framework for handling such inherent variabilities in visual data because of natural image transformations and for computing invariant (stable) visual representations under these:
There are very close similarities between the receptive fields predicted from our theory and receptive fields found by cell recordings in mammalian vision, including (i) spatial on-center-off-surround and off-center-on-surround receptive fields in the fovea and the LGN, (ii) simple cells with spatial directional preference in V1, (iii) spatio-chromatic double-opponent cells in V1, (iv) space-time separable spatio-temporal receptive fields in the LGN and V1 and (v) non-separable space-time tilted receptive fields in V1. Thereby, our theory shows that it is possible to predict properties of visual neurons from a principled axiomatic theory. The receptive field families generated by this theory can also constitute a general basis for expressing visual operations for computational modelling of visual processes and for computer vision algorithms. Specifically, our notions of scale selection based on local extrema over scale of scale-normalized derivative responses, and affine or Galilean normalization by affine shape adaptation or Galilean velocity adaptation, alternatively by detecting affine invariant or Galilean invariant fixed points over filter families in affine or spatio-temporal scale space, provides a general framework for computing scale invariant, affine invariant and Galilean invariant image features and image descriptors both for generic purposes in computer vision and as plausible mechanisms for achieving invariance to natural image transformations in computational models of biological vision. |