Occlusion Handler Density Networks for 3D Multimodal Joint Location of Hand Pose Hypothesis
Abstract
Predicting the pose parameters during the hand pose estimation (HPE) process is an ill-posed challenge. This is due to severe self-occluded joints of the hand. The existing approaches for predicting pose parameters of the hand, utilize a single-value mapping of an input image to generate final pose output. This way makes it difficult to handle occlusion especially when it comes from the multimodal pose hypothesis. This paper introduces an effective method of handling multimodal joint occlusion using the negative log-likelihood of a multimodal mixture-of-Gaussians through a hybrid hierarchical mixture density network (HHMDN). The proposed approach generates multiple feasible hypotheses of 3D poses with visibility, unimodal and multimodal distribution units to locate joint visibility. The visible features are extracted and fed into the Convolutional Neural Networks (CNN) layer of the HHMDN for feature learning. Finally, the effectiveness of the proposed method is proved on ICVL, NYU, and BigHand public hand pose datasets. The imperative results show that the proposed method in this paper is effective as it achieves a visibility error of 30.3mm, which is less error compared to many state-of-the-art approaches that use different distributions of visible and occluded joints.
Keywords: Deep learning, Convolutional neural networks, Self-occluded joints, Unimodal gaussian distribution, Multiple feasible hypotheses