# Axiomatic theory of receptive fields

Receptive field profiles registered by cell recordings have shown that mammalian vision has developed receptive fields tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time.[1][2][3][4][5] Corresponding cell recordings in the auditory system has shown that mammals have developed receptive fields tuned to different frequencies as well as temporal transients.[6][7][8][9] This article describes normative theories that have been developed to explain these properties of sensory receptive fields based on structural properties of the environment. Beyond theoretical explanation of biological phenomena, these theories can also be used for computational modelling of biological receptive fields and for building algorithms for artificial perception based on sensory data.

Scale space
Scale-space axioms
Scale-space implementation
Feature detection
Edge detection
Blob detection
Corner detection
Ridge detection
Interest point detection
Scale selection
Scale-space segmentation
Axiomatic theory of receptive fields

## Computational theory of visual receptive fields

Idealized models of visual receptive fields similar to those found in the retina, the lateral geniculate nucleus and the primary visual cortex of higher mammals can be derived in an axiomatic way from structural requirements on the first stages of visual processing that reflect symmetry properties of the surrounding world in combination with additional assumptions to ensure internally consistent image representations at multiple spatial and temporal scales.[10][11] Specifically, idealized functional models for linear spatio-temporal receptive fields can be derived in a principled manner to constitute a combination of Gaussian derivatives over the spatial domain and either non-causal Gaussian derivatives or truly time-causal temporal scale-space kernels over the temporal domain: [10][11][12]

${\displaystyle T(x_{1},x_{2},t;\;s,\tau ;\;v,\Sigma )=\partial _{\varphi }^{m_{1}}\partial _{\bot \varphi }^{m_{2}}\partial _{\bar {t}}^{n}\left(g(x_{1}-v_{1}t,x_{2}-v_{2}t;\;s,\Sigma )\,h(t;\;\tau )\right)}$

where

• ${\displaystyle x=(x_{1},x_{2})^{T}}$ denotes the image coordinates,
• ${\displaystyle t}$ denotes time,
• ${\displaystyle s}$ denotes the spatial scale,
• ${\displaystyle \tau }$ denotes the temporal scale,
• ${\displaystyle v=(v_{1},v_{2})^{T}}$ denotes a local image velocity,
• ${\displaystyle \Sigma }$ denotes a spatial covariance matrix determining the spatial shape of an affine Gaussian kernel,
• ${\displaystyle m_{1}}$ and ${\displaystyle m_{2}}$ denotes orders of spatial differentiation,
• ${\displaystyle n}$ denotes the order of temporal differentiation,
• ${\displaystyle \partial _{\varphi }=\cos \varphi \,\partial _{x_{1}}+\sin \varphi \,\partial _{x_{2}}}$ and ${\displaystyle \partial _{\bot \varphi }=\sin \varphi \,\partial _{x_{1}}-\cos \varphi \,\partial _{x_{2}}}$ denote spatial directional derivative operators in two orthogonal directions ${\displaystyle \varphi }$ and ${\displaystyle \bot \varphi }$,
• ${\displaystyle g(x;\;s,\Sigma )={\frac {1}{2\pi s{\sqrt {\det \Sigma }}}}e^{-x^{T}\Sigma ^{-1}x/2s}}$ is an affine Gaussian kernel with its size determined by the spatial scale parameter ${\displaystyle s}$ and its shape by the spatial covariance matrix ${\displaystyle \Sigma }$,
• ${\displaystyle g(x_{1}-v_{1}t,x_{2}-v_{2}t;\;s,\Sigma )}$ denotes a spatial affine Gaussian kernel that moves with image velocity ${\displaystyle v=(v_{1},v_{2})}$ in space-time and
• ${\displaystyle h(t;\;\tau )}$ is a temporal smoothing kernel over time corresponding to a Gaussian kernel in the case of non-causal time or a cascade of first-order integrators or equivalently truncated exponential kernels coupled in cascade over a time-causal temporal domain.

Correspondingly, and with similar notation idealized functional models for spatial receptive fields can be expressed of the form

${\displaystyle T(x_{1},x_{2};\;s,\Sigma )=\partial _{\varphi }^{m_{1}}\partial _{\bot \varphi }^{m_{2}}\left(g(x_{1},x_{2};\;s,\Sigma )\right).}$

This model specifically generalizes the receptive field model in terms of Gaussian derivatives[13][14][15][16][17]

${\displaystyle T(x_{1},x_{2};\;s)=\partial _{\varphi }^{m_{1}}\partial _{\bot \varphi }^{m_{2}}\left(g(x_{1},x_{2};\;s)\right)}$

from directional derivatives of rotationally Gaussian kernels ${\displaystyle g(x_{1},x_{2};\;s)}$ to directional derivatives of affine Gaussian kernels ${\displaystyle g(x_{1},x_{2};\;s,\Sigma )}$.

Idealized functional models of receptive fields of these forms have been shown to quite well reproduce the shape of spatial and spatio-temporal receptive fields measured by cell recordings of neurons in the LGN and of simple cells in the primary visual cortex (V1).[10][11][12][3][4]

Theoretical arguments have been presented of preferring this generalized Gaussian model of receptive fields over a Gabor model of receptive fields, because of the better theoretical properties of the generalized Gaussian model under natural image transformations.[10][18] Specifically, these generalized Gaussian receptive fields can be shown to enable computation of invariant visual representations under natural image transformations.[18] By these results, the different shapes of receptive field profiles found in biological vision, which are tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time, can be seen as well adapted to structure of the physical world and be explained from the requirement that the visual system should have the possibility of being invariant to the natural types of image transformations that occur in its environment.[10][11][18]

## Computational theory of auditory receptive fields

A computational theory for auditory receptive fields can be expressed in a structurally similar way, permitting the derivation of auditory receptive fields in two stages:[19][20]

• a first stage of temporal receptive fields corresponding to an idealized cochlea model modeled as a windowed Fourier transform
${\displaystyle S(t,\omega ;\;\tau )=\int _{t'=-\infty }^{\infty }f(t')\,e^{-i\omega t'}\,w(t-t';\;\tau )\,dt'}$

where ${\displaystyle t}$ denotes time, ${\displaystyle \omega }$ denotes the angular frequency, ${\displaystyle \tau }$ denotes the temporal scale of the window function ${\displaystyle w}$, which can be chosen as either Gabor functions in the case of non-causal time or Gammatone functions alternatively generalized Gammatone functions for a truly time-causal model in which the future cannot be accessed,

• a second layer of spectra-temporal receptive fields
${\displaystyle A_{\alpha ,\beta }(t,\nu ;\;\Sigma )=\partial _{t}^{\alpha }\partial _{\nu }^{\beta }\left(g(\nu -vt;\;s)\,T(t;\;\tau )\right)}$

applied to the magnitude of the logarithmically transformed spectrogram

${\displaystyle S_{dB}=20\log _{10}\left({\frac {|S|}{S_{0}}}\right)}$

where

• ${\displaystyle \nu }$ denotes the logarithmic frequency,
• ${\displaystyle \Sigma }$ is a spectro-temporal covariance matrix determining the shape of the second-layer receptive field over the spectro-temporal domain,
• ${\displaystyle \alpha }$ is the order of temporal differentiation,
• ${\displaystyle \beta }$ is the order of logspectral differentiation,
• the smoothing over the logspectral domain is modeled as a Gaussian function ${\displaystyle g(\nu -vt;\;s)}$ extended with glissando adaptation with
• a glissando parameter ${\displaystyle v}$ to account for frequency variations over time

and with the temporal smoothing kernels ${\displaystyle T(t;\;\tau )}$ chosen as either Gaussian kernels over time in the case of non-causal time or first-order integrators (truncated exponential kernels) coupled in cascade in the case of truly time-causal operations.

The shapes of the receptive field functions in these models can be determined by necessity from structural properties of the environment combined with requirements about the internal structure of the auditory system to enable theoretically well-founded processing of sound signals at different temporal and log-spectral scales. Specifically, the resulting spectro-temporal fields in this model obey invariance or covariance properties over natural sound transformations including: (i) temporal shifts, (ii) variations in sound pressure, (iii) the distance between the sound source and the observer, (iv) a shift in the frequencies of auditory stimuli and (v) glissando transformations.[19][20]

Idealized receptive fields of this form can be shown to well model the qualitative shape of spectro-temporal receptive fields as measured by cell recordings in the inferior colliculus (ICC) as well as the linear component of some receptive fields measured in the primary auditory cortex.[19][20]

## References

1. D. Hubel and T. N. Wiesel (1959) "Receptive field of single neurons in the cat’s striate cortex", J Physiol 147, 226–238.
2. D. Hubel and T. N. Wiesel (2005) Brain and Visual Perception: The Story of a 25-Year Collaboration. Oxford University Press.
3. G. C. DeAngelis, I. Ohzawa and R. D. Freeman (1995) "Receptive field dynamics in the central visual pathways". Trends Neurosci. 18(10), 451–457.
4. G. C. DeAngelis and A. Anzai (2004) "A modern view of the classical receptive field: linear and non-linear spatio-temporal processing by V1 neurons. In: Chalupa, L.M., Werner, J.S. (eds.) The Visual Neurosciences, vol. 1, pp. 704–719. MIT Press, Cambridge.
5. B. R. Conway and M. S. Livingstone (2006) "Spatial and temporal properties of cone signals in alert macaque primary visual cortex", The Journal of Neuroscience 26(42): 10826-10846.
6. L. M. Miller, N. A. Escabi, H. L. Read and C. Schreiber (2001) "Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex". J. Neurophys. 87:516-527.
7. A. Qiu, C. E. Schreiber and M.A. Escape (2003) "Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition", Journal of Neurophysiology 90: 456-476.
8. M. Elhilali, J. Fritz, T. S. Chi and S. Shamma (2007) "Auditory cortical receptive fields: Stable entities with plastic abilities", Journal of Neuroscience 27: 10372-10382.
9. C. A. Atencio and C. E. Schreiber (2012) "Spectrotemporal processing in spectral tuning modules of cat primary auditory cortex", PLOS ONE 7:e31537.
10. T. Lindeberg (2013) "A computational theory of visual receptive fields", Biological Cybernetics, 107(6): 589-635.
11. T. Lindeberg (2016) "Time-causal and time-recursive spatio-temporal receptive fields", Journal of Mathematical Imaging and Vision 55(1): 50-88.
12. T. Lindeberg (2011) "Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space", Journal of Mathematical Imaging and Vision, 40(1): 36-81.
13. J. J. Koenderink and A. J. van Doorn (1987) "Representation of local geometry in the visual system", Biological Cybernetics 55:367–375.
14. R. A. Young (1987) "The Gaussian derivative model for spatial vision: I. Retinal mechanisms", Spatial Vision 2(4): 273-293.
15. J. J. Koenderink and A. J. van Doorn (1992) "Generic neighbourhood operators", IEEE Transactions on Pattern Analysis and Machine Intelligence, 14: 597-605.
16. T. Lindeberg (1994). "Scale-space theory: A basic tool for analysing structures at different scales". Journal of Applied Statistics. 21 (2). pp. 224–270. doi:10.1080/757582976.
17. T. Lindeberg (2013) "Invariance of visual operations at the level of receptive fields", PLOS ONE 8(7): e66990, pages 1-33.
18. T. Lindeberg and A. Friberg (2015) "Idealized computational models of auditory receptive fields", PLOS ONE, 10(3): e0119032, pages 1-58.
19. T. Lindeberg and A. Friberg (2015) "Scale-space theory for auditory signals", Proc. SSVM 2015: Scale-Space and Variational Methods in Computer Vision, Springer LNCS 9087: 3-15.