A local 3-D motion descriptor for multi-view human action recognition from 4-D sSpatio-temporal interest Points

Michael B. Holte, Bhaskar Chakraborty, Jordi Gonzàlez, Thomas B. Moeslund

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

57 Cites (Scopus)

Resum

In this paper, we address the problem of human action recognition in reconstructed 3-D data acquired by multi-camera systems. We contribute to this field by introducing a novel 3-D action recognition approach based on detection of 4-D (3-D space $+$ time) spatio-temporal interest points (STIPs) and local description of 3-D motion features. STIPs are detected in multi-view images and extended to 4-D using 3-D reconstructions of the actors and pixel-to-vertex correspondences of the multi-camera setup. Local 3-D motion descriptors, histogram of optical 3-D flow (HOF3D), are extracted from estimated 3-D optical flow in the neighborhood of each 4-D STIP and made view-invariant. The local HOF3D descriptors are divided using 3-D spatial pyramids to capture and improve the discrimination between arm- and leg-based actions. Based on these pyramids of HOF3D descriptors we build a bag-of-words (BoW) vocabulary of human actions, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively. Experiments on the publicly available i3DPost and IXMAS datasets show promising state-of-the-art results and validate the performance and view-invariance of the approach. © 2012 IEEE.
Idioma originalAnglès
Número d’article6178760
Pàgines (de-a)553-565
RevistaIEEE Journal on Selected Topics in Signal Processing
Volum6
DOIs
Estat de la publicacióPublicada - 30 d’ag. 2012

Fingerprint

Navegar pels temes de recerca de 'A local 3-D motion descriptor for multi-view human action recognition from 4-D sSpatio-temporal interest Points'. Junts formen un fingerprint únic.

Com citar-ho