Human Action Modeling (Recognition, Prediction, Transfer, and Reconstruction)

Project Overview

We study human action (motion) modeling including recognition, encoding, reconstruction, and transfer.

Recognition: to identify and classify motions performed by people in a given video.

Encoding: to convert the extracted motion into a canonical representation, for storage, retrieval, or transmission.

Reconstruction: to reproduce the action on a digital avatar from encoded action features/representations.

Transfer: to transfer and re-animate a motion performed by one character to another character.

Semantics-enhanced Early Action Detection using Dynamic Dilated Convolution

M. Korban and X. Li

Pattern Recognition (PR), 2023.

We propose a pipeline to perform early action detection from skeleton-based untrimmed videos. Our pipeline includes two new technical components: (1) a new Dynamic Dilated Convolutional Network (DDCN), which supports dynamic temporal sampling and makes feature learning more robust against temporal scale variance in action sequences; and (2) a new semantic referencing module, which uses identified objects in the scene and their co-existence relationship with actions to adjust the probabilities of inferred actions. Such semantic guidance can help distinguish many ambiguous actions, which is a core challenge in the early detection of incomplete actions.

[Paper][Codes]

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

M. Korban and X. Li

European Conference on Computer Vision (ECCV), 2020.

We propose a Dynamic Directed Graph Convolutional Network (DDGCN) to model spatial and temporal features of human actions from their skeletal representations. The DDGCN consists of three new feature modeling modules: (1) Dynamic Convolutional Sampling (DCS), (2) Dynamic Convolutional Weight (DCW) assignment, and (3) Directed Graph Spatial-Temporal (DGST) feature extraction. Comprehensive experiments show that the DDGCN outperforms existing state-of-the-art action recognition approaches in various testing datasets.

[Paper]

Real-time Avatar Pose Transfer and Motion Generation Using Locally Encoded Laplacian Offsets

M. Lifkooee, C. Liu, Y. Liang, Y. Zhu, and X. Li

Journal of Computer Science and Technology (JCST), 34(2), 1--16, 2019.

We propose a human avatar representation scheme based on intrinsic coordinates, which are invariant to isometry and insensitive to human pose changes, and an efficient pose transfer algorithm that can utilize this representation to reconstruct a human body geometry following a given pose. Such a pose transfer algorithm can be used to control the movement of an avatar model in VR environments following a user's motion in real-time. Our proposed algorithm consists of three main steps. First, we recognize the user’s pose and select a template model from the database who has a similar pose; then, the intrinsic Laplacian offsets encoded in local coordinates are used to reconstruct the human body geometry following the template pose; finally, the morphing between the two poses is generated using a linear interpolation.

[Paper] [Bibtex]

Dataset and Pre-trained Model

Data for DDGCN (ECCV2020): A set of recorded actions performed by a volunteer will be released soon.

Pretrained Model for DDGCN (ECCV2020): The pre-trained DDGCN model will be released soon.

Human Action Modeling (Recognition, Prediction, Transfer, and Reconstruction)

Project Overview

Publications

Dataset and Pre-trained Model