Strong Supervision for Part Based Models


In this project we want to investigate the effectiveness of providing strong supervision to the state of the art part based models. We are mainly focused on the task of Object Detection. More fine grained annotations can help different aspects of a model from initialization to training and testing.

What we have done:

- Dataset of animal parts [1]

In the first version of the dataset we defined and annotated up to 9 parts for the 6 animal classes of PASCAL VOC 2007 and 2010 datasets. The parts are chosen such that they roughly cover the whole animal body. We refer you to [1] for the complete list of the annotated parts. The dataset is publicly available for the scientific studies and can be downloaded from "Resources" section (bottom of the page).

Click here for more (enlarged) sample annotations

- Strong Supervision for Deformable Parts Model [1]

Deformable part-based models achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This work investigates limitations of such an initialization and extends earlier methods using part-level supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initialization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples. We demonstrate significant improvements in detection performance compared to the Felzenszwalb et al. DPM and the Berkeley Poselet object detectors.




Professor Professor
Stefan Carlsson Ivan Laptev
PhD Student
Hossein Azizpour


   [1] Hossein Azizpour , Ivan Laptev. "Object Detection Using Strongly-Supervised Deformable Part Models" European Conference on Computer Vision (ECCV), Florence, Italy, 2012    PDF

Our Results:

VOC 2007 object detection

Per-class results (Average Precision) for animal detection in VOC 2007 compared to Felzenszwalb et al(LSVM).
bird cat cow dog horse sheep mAP
LSVM 10.0 19.3 25.2 11.1 56.8 17.8 23.4
[1] 12.7 26.3 34.6 19.1 62.9 23.6 29.9

VOC 2010 object detection

Per-class results (Average Precision) for animal detection in VOC 2010 compared to Felzenszwalb et al(LSVM) and Berkeley Poselets.
bird cat cow dog horse sheep mAP
LSVM 9.2 22.8 21.2 10.4 40.8 27.0 21.9
Poselets 8.5 22.2 20.6 18.5 48.2 28.0 24.3
[1] 11.3 27.2 25.8 23.7 46.1 28.0 27.0

VOC 2007 Part Localization

Part Localization results in terms of OPCP (refer to [1] for definition of measure)
head front legs fore legs torso/back tail
bird 25.3 - 18.2 - 19.9
cat 59.7 34.9 - 29.1 53.4
cow 35.8 34.6 42.5 66.5 -
dog 40.2 37.1 - 20.1 61.9
horse 65.7 35.8 46.7 66.0 39.6
sheep 28.9 34.3 33.7 65.1 61.4



Note1. All annotations are in XML format (the same structure as in PASCAL VOC) and are readable by PASCAL VOC toolkits
Note2. Corresponding images should be downloaded from the PASCAL VOC webpage of the same year


Trained models for VOC 2007 and 2010 animal classes

Detection Software:

Detection and visualization codes

Training Code

Full training and detection code with a demo


contact Hossein Azizpour via the e-mail address: if you see any inconsistencies in the web-page/dataset or have questions/remarks


Computer Vision and Active Perception Laboratory