Research News

[Prof. Lee, Dongjun] Visual-inertial hand motion tracking with robustness against occlusion, interference, and contact

Author
김민아
Date
2024-04-23
Views
113

INTRODUCTION



Dexterous use of hands (with fingers) is one defining characteristic of humans. Replicating dexterity of the human hand would markedly improve efficiency, intuitiveness, and richness of many real-world human-robot interaction (HRI) applications, including (i) robotic hand teleoperation (Fig. 1B), particularly that of anthropomorphic robotic hands (12), where a remote user can fully use their hand and fingers with haptic feedback for complex manipulation tasks, instead of relying on [typically only up to 6 DOFs (degrees of freedom)] conventional haptic devices (3); (ii) collaborative robot interaction (Fig. 1C and Movie 1), where a user can quickly and intuitively provide rich commands and cues to the robot using their hands and fingers, thereby making the interaction safer and smoother compared to the case of conventional pendant programming (4); and (iii) 3D (three-dimensional) drone swarm control (Fig. 1D and Movie 2), where a user in the field can efficiently control the complex 3D swarm behavior by simply nudging their formation or quickly defining 3D virtual walls to avoid dangerous regions. All the tasks mentioned above are difficult when relying on conventional 2D tablet interfaces (5). This use of the hand will also greatly improve the user experience of virtual reality (VR) and augmented reality (AR), which is currently dominated by 6-DOF “fist-based” controllers (67).

DISCUSSION

Through quantitative and qualitative evaluations, we demonstrate that our VIST framework operates robustly and with high performance in challenging real-world scenarios. In particuar, VIST enables the interaction of diverse objects with hand size/shape variability. In contrast, vision-based systems are not robust for untrained objects/hands (19) or object occlusions (1648), soft sensor wearable systems are susceptible to object mechanical contacts (262829), and IMU/compass or magnetic wearable systems (2022233235) are fragile to ferromagnetic objects or electrical currents. Both the soft and IMU/compass wearable systems are also well known for their tip tracking error proportional to the skeleton size.

Human hands can interact with a myriad of objects in daily life with different hand configurations. The robust hand tracking of our VIST framework may lead to its broader applicability for wide varieties of real-world applications that have so for eluded existing approaches (e.g., daily monitoring for rehab and tool operation skill assessment). The accurate tracking of VIST with CHDs (22) could lead to applications that require extra wearable devices/attachments (e.g., VR/AR, telepicking with CHDs, and soft prosthesis) (45). We also verify that the VIST system can robustly track hand motion outdoors, which is tough for most existing systems, because sunlight interferes with many types of IR sensors [e.g., RGB-D camera (4867) and external IR tracker (25) required for wearable tracking systems (2123)], whereas outdoor hand tracking datasets for machine learning are fairly scarce. Our outdoor experiments verify not only the complete portability of the VIST system in terms of hardware/algorithm but also its feasibility for promising outdoor applications (e.g., intuitive interface for 3D drone swarm control).

The key reason of the superior performance of our VIST framework is that we alleviate the inherent issues of each sensor by visual-inertial TC fusion. The VIST framework circumvents the fundamental issues of vision-based systems [occlusion, generalization, and slow update (191748)], because the motion of the occluded parts can still be accurately estimated by using the IMU information at a high rate (about 100 Hz) with the real-time/autocalibrated hand/sensor-related parameters, anatomical constraints, and still visible markers. Our VIST system also overcomes the issues of drift or magnetic interference of IMU/compass-wearable systems by exploiting the visual information in conjunction with the anatomical constraints and also the issues of unmodeled contacts for soft sensor wearable systems, because the camera and IMUs are immune to them. Moreover, the integrated autocalibration endows our VIST framework with improved accuracy and convenience as compared to existing IMU/compass or soft sensor wearable systems, where those parameters are calibrated once before the operation while the user takes several indicated poses, which is inevitable with human errors (525471).



In conclusion, our VIST framework solves those fundamental limitations of existing hand tracking systems. This improved performance can be achieved by fusing the complementary aspects of visual and inertial sensors in TC fusion, which turns out to be crucial to properly address the peculiarity of the hand (and finger) tracking. With the ruggedness, portability, and affordable cost, our VIST system could allow for many promising real-world applications based on hand motion tracking.


System configuration and possible applications of VIST.


(A) Hardware (i.e., sensor glove with IMUs/markers and stereo camera) and working principle of VIST. (B) Robot hand teleoperation (left image courtesy of DYROS/Seoul National University). (C) Collaborative robot interaction (Movie 1). (D) 3D drone swarm control (Movie 2).

More information : Visual-inertial hand motion tracking with robustness against occlusion, interference, and contact | Science Robotics