Deep
convolutional neural networks can be trained to estimate gaze directions from
eye images. However, such networks do not provide any information about the
reliability of its predictions. As uncertainty estimates could enable more
accurate and reliable gaze tracking applications, a method for confidence
calculation was examined in this project.
This method
had to be computationally efficient for the gaze tracker to function in
real-time, without reducing the quality of the gaze predictions. Thus, several
state-of-the-art methods were abandoned in favor of Mean-Variance Estimation,
which uses an additional neural network for estimating uncertainties. This
confidence network is trained based on the accuracy of the gaze rays generated
by the primary network, i.e. the prediction network, for different eye images.
Two datasets were used for evaluating the confidence network, including the
effect of different design choices.
A main
conclusion was that the uncertainty associated with a predicted gaze direction
depends on more factors than just the visual appearance of the eye image. Thus,
a confidence network taking only this image as input can never model the
regression problem perfectly.
Despite
this, the results show that the network learns useful information. In fact, its
confidence estimates outperform those from an established Monte Carlo method,
where the uncertainty is estimated from the spread of gaze directions from
several prediction networks in an ensemble.