Human body motion prediction is a fundamental part of many human-robot applications. Despite the recent progress in the area, most studies predict human body motion relative to a fixed joint, only limit their model to predict one possible future motion, or both. However, due to the complex nature of human motion, a single prediction cannot adequately reflect the many possible movements one can make. Also, for any robotics application, prediction of the full human body motion including the absolute 3D trajectory – not just a 3D body pose relative to the hip joint – is needed.
In this paper, we try to address these two shortcomings by proposing a transformer-based generative model for forecasting multiple diverse human motions. Our model generates N future possible body motions given the human motion history. This is achieved by first predicting the pose of the body relative to the hip joint as was done in prior work. Then, our proposed Hip Prediction Module predicts the trajectory of the hip position relative to a global reference frame for each predicted pose frame, an aspect of human body motion neglected by previous work. To obtain a set of diverse predicted motions, we introduce a similarity loss that penalizes the pairwise sample distance. Our system not only outperforms the state-of-the-art in human motion prediction, but also is able to predict a diverse set of future human body motions including the hip trajectory.