TY - JOUR
T1 - Multiview Random Forest of Local Experts Combining RGB and LIDAR data for Pedestrian Detection
AU - Gonzalez, Alejandro
AU - Villalonga, Gabriel
AU - Xu, Jiaolong
AU - Vazquez, David
AU - Amores, Jaume
AU - Lopez, Antonio M.
PY - 2015
Y1 - 2015
N2 - Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multimodality and strong multi-view classifier) affect performance both individually and when integrated together. In the multimodality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.
AB - Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multimodality and strong multi-view classifier) affect performance both individually and when integrated together. In the multimodality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=uab_pure&SrcAuth=WosAPI&KeyUT=WOS:000380565800059&DestLinkType=FullRecord&DestApp=WOS
M3 - Article
SN - 1931-0587
SP - 356
EP - 361
JO - 2018 Ieee Intelligent Vehicles Symposium (iv)
JF - 2018 Ieee Intelligent Vehicles Symposium (iv)
ER -