A local 3-D motion descriptor for multi-view human action recognition from 4-D sSpatio-temporal interest Points

Michael B. Holte, Bhaskar Chakraborty, Jordi Gonzàlez, Thomas B. Moeslund

Research output: Contribution to journalArticleResearchpeer-review

49 Citations (Scopus)

Abstract

In this paper, we address the problem of human action recognition in reconstructed 3-D data acquired by multi-camera systems. We contribute to this field by introducing a novel 3-D action recognition approach based on detection of 4-D (3-D space $+$ time) spatio-temporal interest points (STIPs) and local description of 3-D motion features. STIPs are detected in multi-view images and extended to 4-D using 3-D reconstructions of the actors and pixel-to-vertex correspondences of the multi-camera setup. Local 3-D motion descriptors, histogram of optical 3-D flow (HOF3D), are extracted from estimated 3-D optical flow in the neighborhood of each 4-D STIP and made view-invariant. The local HOF3D descriptors are divided using 3-D spatial pyramids to capture and improve the discrimination between arm- and leg-based actions. Based on these pyramids of HOF3D descriptors we build a bag-of-words (BoW) vocabulary of human actions, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively. Experiments on the publicly available i3DPost and IXMAS datasets show promising state-of-the-art results and validate the performance and view-invariance of the approach. © 2012 IEEE.
Original languageEnglish
Article number6178760
Pages (from-to)553-565
JournalIEEE Journal on Selected Topics in Signal Processing
Volume6
DOIs
Publication statusPublished - 30 Aug 2012

Keywords

  • 3-D
  • 4-D spatio-temporal interest points (STIPs)
  • IXMAS
  • human action recognition
  • i3DPost
  • local motion description
  • multi-view
  • view-invariance

Fingerprint Dive into the research topics of 'A local 3-D motion descriptor for multi-view human action recognition from 4-D sSpatio-temporal interest Points'. Together they form a unique fingerprint.

Cite this