20 Apr 2021

Alaa El-Nouby: Training Vision Transformers for Image Retrieval

Title: Training Vision Transformers for Image Retrieval

Speaker: Alaa El-Nouby, Facebook AI Research and Inria Paris

Date and Time: Tuesday, April 20, 1-2 pm

Meeting ID: 699 6421 6598 Pass code: 600145

Abstract: Transformers have shown outstanding results for natural language understanding and, more recently, for image classification. We here extend this work and propose a transformer-based approach for image retrieval: we adopt vision transformers for generating image descriptors and train the resulting model with a metric learning objective, which combines a contrastive loss with a differential entropy regularizer. Our results show consistent and significant improvements of transformers over convolution-based approaches. In particular, our method outperforms the state of the art on several public benchmarks for category-level retrieval. Furthermore, our experiments show that, in comparable settings, transformers are competitive for particular object retrieval, especially in the regime of short vector representations and low-resolution images.

Bio: Alaa El-Nouby is a PhD student jointly at Facebook AI Research and Inria Paris advised by Hervé Jégou, Natalia Neverova and Ivan Laptev. His research interests are metric learning, image retrieval and more recently transformers for computer vision. Prior to his PhD, Alaa got his Msc from the University of Guelph and the Vector institute, advised by Graham Taylor, where he conducted research in spatio-temporal representation learning and text-to-image synthesis with generative models.

Organizer: Hossein Azizpour

KTH Machine Learning Seminars

Alaa El-Nouby: Training Vision Transformers for Image Retrieval