publications | Abhineet Jain

2023

MS-Thesis

Towards Safe and Efficient Learning for Dexterous Manipulation

Abhineet Jain

Georgia Institute of Technology, 2023

Abs PDF

Imitation learning (IL) is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or significant computational effort. However, existing IL approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. Other approaches like reinforcement learning interact with the environment to train black-box neural networks, providing little control over how the robot learns and performs the skills. Such approaches are especially challenging on physical platforms - a robot’s erratic behavior poses harm not only to itself but also the entities around it. Since training uses a large number of interactions with the environment, there is also room for improving sample efficiency. In this work, we address these challenges. First, we demonstrate that we can learn an object relocation task using demonstrations from LeapMotion, an inexpensive vision-based sensor. Policies using these demonstrations show similar success to policies from wearable sensors, at the cost of sample efficiency. This reduces the setup cost to collect demonstrations by approximately 140x. Second, we investigate the importance of collecting additional data that better represents the full operating conditions. We compare the performance of corrective additional demonstrations and randomly-sampled additional demonstrations for an object relocation task. When there are more additional demonstrations from the full task distribution than the original demonstrations from a restrictive training distribution, the corrective demonstrations considerably outperform the randomly-sampled ones. Otherwise, there are no significant differences between the two. Third, we introduce a simple geometric constraint to guide the robot when learning object relocation. Using Constrained Policy Optimization, the robot can quickly learn to move towards the object, and uses similar number of samples to learn the skill as the unconstrained approach. We show how simple constraints can help robots achieve sensible and safe behavior quickly and ease concerns surrounding hardware deployment. We also provide insights into how different degrees of strictness of these constraints affect learning. Finally, we curate a library of constraints generalizable across multiple dexterous manipulation tasks, and introduce a hierarchical approach that prioritizes these constraints across different phases of the task. We train two policies to learn an object relocation task - a low-level policy that learns how to perform the task given a set of constraints, and a high-level policy that decides what constraints to activate during different stages of training the low-level policy. With this hierarchical approach, the agent learns to perform the task while reducing the duration of unsafe behaviors during training. Our findings indicate that prioritizing the right constraints addresses physical and environmental safety concerns as the hierarchical policy can both train and perform the tasks safely.

2022

SafeRL-IJCAI’22

Constrained Reinforcement Learning for Dexterous Manipulation

Abhineet Jain, Jack Kolb, and Harish Ravichandar

International Workshop on Safe Reinforcement Learning at the International Joint Conference on Artificial Intelligence, 2022

Abs PDF

Existing learning approaches to dexterous manipulation use demonstrations or interactions with the environment to train black-box neural networks that provide little control over how the robot learns the skills or how it would perform post training. These approaches pose significant challenges when implemented on physical platforms given that, during initial stages of training, the robot’s behavior could be erratic and potentially harmful to its own hardware, the environment, or any humans in the vicinity. A potential way to address these limitations is to add constraints during learning that restrict and guide the robot’s behavior during training as well as roll outs. Inspired by the success of constrained approaches in other domains, we investigate the effects of adding position-based constraints to a 24-DOF robot hand learning to perform object relocation using Constrained Policy Optimization. We find that a simple geometric constraint can ensure the robot learns to move towards the object sooner than without constraints. Further, training with this constraint requires a similar number of samples as its unconstrained counterpart to master the skill. These findings shed light on how simple constraints can help robots achieve sensible and safe behavior quickly and ease concerns surrounding hardware deployment. We also investigate the effects of the strictness of these constraints and report findings that provide insights into how different degrees of strictness affect learning outcomes.
MLHRC-HRI’22

Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation

Abhineet Jain^*, Jack Kolb^*, J. M. Abbess IV, and Harish Ravichandar

Machine Learning in Human-Robot Collaboration Workshop at ACM/IEEE International Conference on Human-Robot Interaction, 2022

Abs PDF

Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or a significant computational effort. However, existing imitation learning approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. One way to address this limitation is to gather additional data that better represents the full operating conditions. In this work, we investigate characteristics of such additional demonstrations and their impact on performance. Specifically, we study the effects of corrective and randomly-sampled additional demonstrations on learning a policy that guides a five-fingered robot hand through a pick-and-place task. Our results suggest that corrective demonstrations considerably outperform randomly-sampled demonstrations, when the proportion of additional demonstrations sampled from the full task distribution is larger than the number of original demonstrations sampled from a restrictive training distribution. Conversely, when the number of original demonstrations are higher than that of additional demonstrations, we find no significant differences between corrective and randomly-sampled additional demonstrations. These results provide insights into the inherent trade-off between the effort required to collect corrective demonstrations and their relative benefits over randomly-sampled demonstrations. Additionally, we show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations for dexterous manipulation tasks.
PATENT

Multiple Pane Web Display with Dynamic Content

Vidhem Chhabra, Abhineet Jain, Christian Johannessen, Aditya Pandya, David Park, and Luis Tadeo

U.S. Patent and Trademark Office, Patent No. 11,283,807, Application No. 16/711299, 2022

Abs PDF

An inter-frame and webpage generation and communication system and method provide a mechanism by which a parent webpage or frame can allocate a child frame of the displayed page for content generated by a web application other than that generating the parent webpage or frame and certain information can be exchanged between the child frame and the parent webpage or frame. Multiple frames of a webpage not only react to interactive events occurring within that frame, but also react to events occurring in child frames. In at least one embodiment, a mechanism for inter-frame communication is provided to enable communication between child and parent frames, and the inter-frame communication mechanism is agnostic as to whether the code displayed in the child and parent frames is provided by one or more web application servers.