Multimodal Human Action Recognition Based on a Fusion of Dynamic Images Using CNN Descriptors

Edwin Jonathan Escobedo Cardenas, Guillermo Camara Chavez

Research output: Chapter in Book/Report/Conference proceedingPaper (Conference contribution)peer-review

2 Scopus citations

Abstract

In this paper, we propose the use of dynamic-images-based approach for action recognition. Specifically, we exploit the multimodal information recorded by a Kinect sensor (RGB-D and skeleton joint data). We combine several ideas from rank pooling and skeleton optical spectra to generate dynamic images to summarize an action sequence into single flow images. We group our dynamic images into five groups: a dynamic color group (DC); a dynamic depth group (DD) and three dynamic skeleton groups (DXY, DYZ, DXZ). As action is composed of different postures along time, we generated N different dynamic images with the main postures for each dynamic group. Next, we applied a pre-trained flow-CNN to extract spatiotemporal features with a max-mean aggregation. The proposed method was evaluated on a public benchmark dataset, the UTD-MHAD, and achieved the state-of-the-art result.

Original languageEnglish
Title of host publicationProceedings - 31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages95-102
Number of pages8
ISBN (Electronic)9781538692646
DOIs
StatePublished - 15 Jan 2019
Externally publishedYes
Event31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018 - Foz do Iguacu, Brazil
Duration: 29 Oct 20181 Nov 2018

Publication series

NameProceedings - 31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018

Conference

Conference31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018
Country/TerritoryBrazil
CityFoz do Iguacu
Period29/10/181/11/18

Keywords

  • action recognition
  • CNN
  • dynamic images
  • RGB D data

Fingerprint

Dive into the research topics of 'Multimodal Human Action Recognition Based on a Fusion of Dynamic Images Using CNN Descriptors'. Together they form a unique fingerprint.

Cite this