Dynamic Graph Transformers for Temporal Human Activity Recognition
Main Article Content
Abstract
Human activity recognition (HAR) from sequential sensor or video data is a fundamental problem in machine perception, with applications in surveillance, robotics, healthcare monitoring, and smart environments. Traditional models rely on static graph structures or recurrent architectures that struggle to capture dynamic spatial-temporal dependencies. In this paper, we propose a novel architecture—Dynamic Graph Transformer (DGT)—that integrates graph construction and temporal attention within a unified transformer framework. Unlike prior works that use pre-defined or fixed adjacency matrices, our model learns time-varying interaction graphs among human joints or entities through self-attention, enabling adaptive modeling of pose, motion, and contextual correlation. We introduce a dynamic graph encoder that computes attention-weighted edge strengths at each frame and a temporal transformer that aggregates node- level information across time. The model is fully end-to-end trainable and requires no manual graph design. Evaluations on three benchmark datasets—NTU RGB+D 60, Kinetics Skeleton, and SHREC—demonstrate that our approach significantly outperforms conventional graph convolution networks and RNN-based models.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.