OCG project: Object Centered Grids for contextual modeling

BACK


Histogram based mathods, in their standard form, suffer from lack of spatial information. One simple and yet effective way to incorporate spatial information into histograms, and in particular BoF framework, is to put a fixed-grid over the image frame (Figure 1). Then, one can build one histogram for each grid-cell and concatenate them altogether which yields a long histogram. Intuitive interpretation of this idea is that the histograms coming from top part of a, let's say, street image tend to model sky, the histograms coming form center part of the image tend to model people, buildings, and cars and finally the histograms coming from bottom part of the image tend to model ground. This type of histogram decomposition helps to improve discriminant power of the long concatenated histogram compared to the case when no spatial grid is used.
Figure 1. Incorporating spatial information into BoF histograms using fixed-grids.
Our interpretation of spatial location differs from the fixed-grid idea in the sense that we believe that it makes more sense if location information is presented w.r.t. location of salient object(s) in the scene. We do this by adjusting the spatial grid so that its center cell fits the salient object. Figure 2 illustrates the difference between the two approaches. If you compare each image in the top row with its corresponding image in the bottom row, you will see that OCG preserves its tendency to keep the same interpretation for content of each grid-cell over different images.
Figure 2. Top row: Fixed-grid approach. Bottom row: OCG approach.
Our evaluations are based on the image classification challenge of PASCAL'07 database. There are 20 different object classes in the database and the goal is to predict if a certain type of object is present in the image or not, regardless of its location.
Performance of the OCG method is affected by localization accuracy of the grids. When we have perfect information about bounding box(es) of the salient object(s), the OCG method is far much better than the fixed-grid approach (Figure 3).

Note: In our evaluations we chose the salient object to be the target (classification) object. So, to localize OCGs we use object bounding box annotation which is available in the database. However, we have to stress that, in order to be fair, we do not extract any features from the center cell which contains the salient object. Nevertheless, our goal in this work is only to show the amount of improvement one could get by ideally localizing spatial grids. By no means, we do not claim that we have beaten the state-of-the-art mehtod in this problem although our evaluation shows superior performance.
Figure 3. When location of some salient object in the image is already known, the OCG approach works far much better than fixed-gid.
In order to see how the OCG method is affected by localization noise we simulated an object detector by replacing some of the annotated object bounding boxes by randomly generated bounding boxes. Outcome of this analysis is illustrated in Figure 4. The vertical axis shows classification performance and the horizontal axis shows percentage of the annotated bounding boxes that we use in our simulation (true positives of the simulated detector). Curves in different colors represent different number of randomly generated bounding boxes per image (false positives of the simulated detector).
The dashed horizontal line in the figure shows accuracy of the baseline method (fixed-gird) for the same problem.

Note: The red curve, which represents 10 random bounding-boxes/image, is a valid case to be considered as a typical off-the-shelf object detector.
Figure 4. Analysis of performance of OCG method when we only have noisy information about location of salient object(s).
In order to generate random bounding boxes for the salient object(s) we sample from prior distribution of location and size of the object bounding box in a normalized image frame. This prior probabiliry is modeled by a 4 dimensional gaussian distribution using bounding box annotations of the training set. Figure 5 shows this prior distribution for 2 object classes. Images in Figure 5 are generated by sampling 100 bounding boxes from the trained prior distributions and then increasing intensity of pixels covered by area of the bounding box by one.
Figure 5. Prior distribution of location and dimensions for bounding box of car (bottle) is showed at left (right).

Related publications:

S. Naderi Parizi, I. Laptev, A. Tavakoli Targhi
Modeling Image Context using Object Centered Grids
Int. Conf. on Digital Image Computing: Techniques and Applications (DICTA'09), Melbourne, Australia, Dec. 2009 (PDF) (PPT)