© 2015 Elsevier Inc. Abstract Deformable object models capture variations in an object's appearance that can be represented as image deformations. Other effects such as out-of-plane rotations, three-dimensional articulations, and self-occlusions are often captured by considering mixture of deformable models, one per object aspect. A more scalable approach is representing instead the variations at the level of the object parts, applying the concept of a mixture locally. Combining a few part variations can in fact cheaply generate a large number of global appearances. A limited version of this idea was proposed by Yang and Ramanan , for human pose dectection. In this paper we apply it to the task of generic object category detection and extend it in several ways. First, we propose a model for the relationship between part appearances more general than the tree of Yang and Ramanan , which is more suitable for generic categories. Second, we treat part locations as well as their appearance as latent variables so that training does not need part annotations but only the object bounding boxes. Third, we modify the weakly-supervised learning of Felzenszwalb et al. and Girshick et al. ,  to handle a significantly more complex latent structure. Our model is evaluated on standard object detection benchmarks and is found to improve over existing approaches, yielding state-of-the-art results for several object categories.
- Deformable part models
- Discovering discriminative parts
- Learning and sharing parts
- Object recognition