A Skeleton-Based Deep Learning Approach for Recognizing Violent Actions in Surveillance Scenarios

Rabia Jafri, Rodrigo Louzada Campos, Hamid R. Arabnia, 24th International Conference on Human-Computer Interaction (HCII 2022) – Late Breaking Posters, vol 1655, pp. 624-631. Springer, 2022.

Abstract: A novel skeleton-based approach that recognizes specific violent actions (VAs) such as kicking and punching, which are highly relevant in surveillance scenarios, is presented. The method uses a depth sensor for more efficient and accurate depth data acquisition and classifies an action by utilizing the forecasts of an ensemble of Long Short-Term Memory (LSTM) networks, each trained to predict a specific VA. The proposed method offers the advantages of requiring a smaller dataset for training (since only data for a few specific VAs is required and data for non-VAs is not needed) and a lower risk of misclassification (since a separate LSTM network is trained for each VA). The utilization of a compact skeletal representation and a distributed architecture allows the system to operate efficiently bolstering its potential to be practically used in real-world scenarios.