Camshift Algorithm

G.R. Bradski, Computer video face tracking for use in a perceptual user interface, Intel Technology Journal, Q2 1998. pdf

Camshift stands for "Continuously Adaptive Mean Shift." This is the basis for the face-tracking algorithm in OpenCV. It combines the basic Mean Shift algorithm with an adaptive region-sizing step. The kernel is a simple step function applied to a skin-probability map. The skin probability of each image pixel is based on color using a method called histogram backprojection. Color is represented as Hue from the HSV color model.

Since the kernel is a step function, the mean shift at each iteration is simply the average x and y of skin-probability contributions within the current region. This is determined by dividing the first moments of the region by its zeroth moment at each iteration and shifting the region to the probability centroid.

After Mean Shift converges to an (x,y) location, scale is updated based on the current value for the zeroth moment. This update is heuristic. Linear scale is assumed to be proportional to the square root of the summed probability contributions within the current region (i.e., the zeroth moment). Height and width are assumed to have a predetermined, consistent ratio.

Since Hue is unstable at low saturation, the color histograms don't include pixels with saturation below some threshold. Similarly, minimum and maximum intensity values can be applied to skip over pixels that are very bright or very dark.

Pro: This method is fast and appears on initial testing to be moderately accurate. It may be possible to improve accuracy by using a different color representation.

Con: There are quite a few parameters: the number of histogram bins, the minimum saturation, minimum and maximum intensity, and the width-to-height ratio for faces. There's also a parameter for enlarging the face region while doing Mean Shift to increase the chances of finding the maximum for skin-probability density.

 

Home | Face Tracking