EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

Songpengcheng Xia1, Yu Zhang1, Zhuo Su2,†, Xiaozheng Zheng2, Zheng Lv2, Guidong Wang2,
Yongjie Zhang2, Qi Wu1, Lei Chu1, Ling Pei1,‡

1Shanghai Jiao Tong University    2ByteDance
Project Leader
Corresponding author

Estimating full-body motion using the tracking signals of head and hands from VR devices holds great potential for various applications. However, the sparsity and unique distribution of observations present a significant challenge, resulting in an ill-posed problem with multiple feasible solutions (i.e., hypotheses). This amplifies uncertainty and ambiguity in full-body motion estimation, especially for the lower-body joints. Therefore, we propose a new method, EnvPoser, that employs a two-stage framework to perform full-body motion estimation using sparse tracking signals and pre-scanned environment from VR devices. EnvPoser models the multi-hypothesis nature of human motion through an uncertainty-aware estimation module in the first stage. In the second stage, we refine these multi-hypothesis estimates by integrating semantic and geometric environmental constraints, ensuring that the final motion estimation aligns realistically with both the environmental context and physical interactions. Qualitative and quantitative experiments on two public datasets demonstrate that our method achieves state-of-the-art performance, highlighting significant improvements in human motion estimation within motion-environment interaction scenarios. Our code will be released at our project page.
Pipeline

Overview of EnvPoser: A Two-Stage Motion Estimation Model. Stage I involves training the uncertainty-aware initial estimation module on the AMASS dataset to produce initial motion estimates with uncertainty quantification. Stage II refines these estimates by training on motion-environment datasets, incorporating semantic and geometric environmental constraints.

Long Narrated Video

Envposer is validated on the GIMO and EgoBody datasets and demonstrated in a real-world PICO device.

Citation

@article{xia2024envposer, title={EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling}, author={Xia, Songpengcheng and Zhang, Yu and Su, Zhuo and Zheng, Xiaozheng and Lv, Zheng and Wang, Guidong and Zhang, Yongjie and Wu, Qi and Chu, Lei and Pei, Ling}, journal={ IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)}, year={2025}, publisher={IEEE}, booktitle={cvpr} }

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
Thanks to Lior Yariv and Jianfeng Xiang for the website template