arXiv:2512.06306v2 Announce Type: replace-cross
Abstract: Human pose estimation focuses on predicting body keypoints to analyze human motion. Currently, most pose estimation tasks rely on conventional RGB cameras. In contrast, event cameras provide high temporal resolution and low latency, enabling robust estimation under challenging conditions and opening up new possibilities for pose estimation. However, most existing methods convert event streams into dense event frames, which adds extra computation and sacrifices the high temporal resolution of the event signal. In this work, we aim to exploit the spatiotemporal properties of event streams based on point cloud-based framework, designed to enhance human pose estimation performance while maintaining computational efficiency. We design Event Temporal Slicing Convolution module to capture short-term dependencies across event slices, and combine it with Event Slice Sequencing module for structured temporal modeling. We further propose an edge-enhanced point cloud-based event representation to enhance spatial edge information under sparse event conditions to further improve performance. Experiments on the DHP19 dataset show our proposed method consistently improves performance across three representative point cloud backbones: PointNet, DGCNN, and Point Transformer, with an average MPJPE reduction of 4%.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844