 What is the problem?
Accurate visionbased people detection and tracking algorithms have been of interest for the past decades in applications like sport game analysis, videosurveillance (e.g. behavior analysis, automatic pedestrian counting). Isolated people, in an uncluttered scene, are successfully detected with a single static or moving camera based on pattern recognition techniques. However, those algorithms fail to detect a group of people due to their mutual occlusions. For instance, in sport games such as basketball, players can strongly occlude each other and have abrupt changes of behavior. In order to handle the occlusion problem, several cameras should be fused to correctly detect and track all the people present in the scene.
 What is our solution?
A novel approach is proposed to locate a dense set of people with a network of heterogeneous cameras. We propose to recast the problem as a linear inverse problem. The proposed framework is generic to any scene, scalable in the number of cameras and versatile with respect to their geometry, e.g. planar or omnidirectional. It relies on deducing an occupancy vector, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera.
This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few nonzero elements, while a particular dictionary of silhouettes linearly maps these nonempty grid locations to the multiple silhouettes viewed by the cameras network. This constitutes a linearization of the problem, where nonlinearities, such as occlusions, are treated as additional noise on the observed silhouettes. Mathematically, we express the final inverse problem either as Basis Pursuit DeNoise or Lasso convex optimization programs. The sparsity measure is reinforced by iteratively reweighting the `1norm of the occupancy vector for better approximating its `0 “norm”, and a new kind of “repulsive” sparsity is used to adapt further the Lasso procedure to the occupancy reconstruction.
Practically, an adaptive sampling process is proposed to reduce the computation cost and monitor a large occupancy area. Qualitative and quantitative results are presented on a basketball game.
 Why is our solution proposed?
The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming stateoftheart people localization techniques.
Some examples given the PETS 2009 dataset.
Related publications:

Alexandre Alahi, Albert Haque, and Li FeiFei.
"RGBW: When Vision Meets Wireless."
IEEE International Conference on Computer Vision (ICCV), 2015.
[pdf  bibtex  code  project page] 

Golbabaee, Mohammad, Alexandre Alahi, and Pierre Vandergheynst.
"Scoop: A realtime sparsity driven people localization algorithm."
Journal of mathematical imaging and vision, 2014.
[pdf  bibtex  code  project page] 

Alexandre Alahi, Mohammad Golbabaee, and Pierre Vandergheynst.
"Method and system for automatic objects localization."
U.S. Patent No. 8,749,630. 10 Jun. 2014.
[pdf  bibtex  code  project page] 

Alexandre Alahi, Laurent Jacques, Yannick Boursier, and Pierre Vandergheynst.
"Sparsity driven people localization with a heterogeneous network of cameras."
Journal of Mathematical Imaging and Vision, 2011.
[pdf  bibtex  code  project page] 

Alexandre Alahi, Laurent Jacques, Yannick Boursier, and Pierre Vandergheynst.
"Sparsitydriven people localization algorithm: Evaluation in crowded scenes environments."
Performance Evaluation of Tracking and Surveillance (PETSWinter), 2009.
[pdf  bibtex  code  project page] 

Alexandre Alahi, Yannick Boursier, Laurent Jacques, and Pierre Vandergheynst.
"Sport players detection and tracking with a mixed network of planar and omnidirectional cameras."
ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), 2009.
(Challenge price winner)
[pdf  bibtex  code  project page] 

Alexandre Alahi, Yannick Boursier, Laurent Jacques, and Pierre Vandergheynst.
A sparsity constrained inverse problem to locate people in a network of cameras
IEEE International Conference on Digital Signal Processing, 2009.
[pdf  bibtex  code  project page] 
