Shikhar Bahl

Hi there! I am a fourth year PhD student at the Robotics Institute, within the School of Computer Science at Carnegie Mellon University. I am interested in the intersection of robot control and machine learning. My goal is to build robotic agents that can operate in the wild. I am advised by Deepak Pathak and Abhinav Gupta. I am also a visiting researcher at FAIR, working with Aravind Rajeswaran. I have spent time at NVIDIA Robotics as a research intern.

Prior to CMU, I did my undergrad at UC Berkeley in Applied Math and Computer Science, where I was affiliated with Berkeley Artificial Intelligence Research (BAIR) and worked under Sergey Levine on problems in deep reinforcement learning and robotics.

Feel free to contact me via email! You can reach me at sbahl2 -at- cs dot cmu dot edu

email / CV / Google Scholar / Twitter / GitHub

Research

I am broadly interested in creating robust autonomous agents that operate with minimal or no human supervision, in the wild. My research focuses on learning for perception and robot control. Here is some of my work (representative papers are highlighted):

	Structured World Models from Human Videos Russell Mendonca, Shikhar Bahl, Deepak Pathak RSS 2023 (Invited to IJRR Special Issue) webpage \| abstract \| bibtex \| arXiv @article{mendonca23swim, title={Structured World Models from Human Videos}, author={Mendonca, Russell and Bahl, Shikhar and Pathak, Deepak}, journal={RSS}, year={2023}, }
	Efficient RL via Disentangled Environment and Agent Representations Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak ICML 2023 (Oral) webpage \| abstract \| bibtex \| pdf @article{Gmelin2023sear, title={Efficient RL via Disentangled Environment and Agent Representations}, author={Gmelin, Kevin and Bahl, Shikhar and Mendonca, Russell and Pathak, Deepak}, journal={ICML}, year={2023} }
	Affordances from Human Videos as a Versatile Representation for Robotics Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak CVPR 2023 webpage \| abstract \| bibtex \| arXiv \| code @inproceedings{bahl2023affordances, title={Affordances from Human Videos as a Versatile Representation for Robotics}, author={Bahl, Shikhar and Mendonca, Russell and Chen, Lili and Jain, Unnat and Pathak, Deepak}, journal={CVPR}, year={2023} }
	ALAN : Autonomously Exploring Robotic Agents in the Real World Russell Mendonca, Shikhar Bahl, Deepak Pathak ICRA 2023 webpage \| abstract \| bibtex \| arXiv \| @article{mendonca2023alan, author = {Mendonca, Russell and Bahl, Shikhar and Pathak, Deepak}, title = {ALAN : Autonomously Exploring Robotic Agents in the Real World}, journal= {ICRA}, year = {2023} }
	VideoDex: Learning Dexterity from Internet Videos Kenneth Shaw, Shikhar Bahl, Deepak Pathak CoRL 2022 webpage \| abstract \| bibtex \| arXiv \| demo @article{videodex, title={VideoDex: Learning Dexterity from Internet Videos}, author={Shaw, Kenneth and Bahl, Shikhar and Pathak, Deepak}, journal= {CoRL}, year={2022} }
	Human-to-Robot Imitation in the Wild Shikhar Bahl, Abhinav Gupta, Deepak Pathak RSS 2022 webpage \| pdf \| abstract \| bibtex \| arXiv \| videos \| talk We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third person perspective. We call our method WHIRL: In the Wild Human-Imitated Robot Learning. In WHIRL, we aim to use human videos to extract a prior over the intent of the demonstrator, and use this to initialize our agent's policy. We introduce an efficient real-world policy learning scheme, that improves over the human prior using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show, one-shot, generalization and success in real world settings, including 20 different manipulation tasks in the wild. @article{bahl2022human, author = {Bahl, Shikhar and Gupta, Abhinav and Pathak, Deepak}, title = {Human-to-Robot Imitation in the Wild}, journal= {RSS}, year = {2022} }
	RB2: Robotic Manipulation Benchmarking with a Twist Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Abitha Thankaraj, Karanbir Chahal, Berk Calli, Saurabh Gupta, David Held, Lerrel Pinto, Deepak Pathak, Vikash Kumar, Abhinav Gupta NeurIPS 2021 (Datasets and Benchmark) webpage \| pdf \| abstract \| bibtex \| code Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. YCB object set) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these 'local rankings' could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research's quality and rigor. @inproceedings{dasari2021rb2, title={RB2: Robotic Manipulation Benchmarking with a Twist}, author={Dasari, Sudeep and Wang, Jianren and Hong, Joyce and Bahl, Shikhar and Lin, Yixin and Wang, Austin S and Thankaraj, Abitha and Chahal, Karanbir Singh and Calli, Berk and Gupta, Saurabh and others}, booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)}, year={2021} }
	Hierarchical Neural Dynamic Policies Shikhar Bahl, Abhinav Gupta, Deepak Pathak RSS 2021 (Invited to Autonomous Robots Special Issue) webpage \| pdf \| abstract \| bibtex \| arXiv \| talk video We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. @article{bahl2021hndp, author = {Bahl, Shikhar and Gupta, Abhinav and Pathak, Deepak}, title = {Hierarchical Neural Dynamic Policies}, journal= {RSS}, year = {2021} }
	Neural Dynamic Policies for End-to-End Sensorimotor Learning Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak NeurIPS 2020 (Spotlight) webpage \| pdf \| abstract \| bibtex \| arXiv \| code \| demo \| spotlight talk The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or deep reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. We show that NDPs achieve better or comparable performance to state-of-the-art approaches on many robotic control tasks using both reward-based training and demonstrations. @inproceedings{bahl2020ndp, Author = {Bahl, Shikhar and Mukadam, Mustafa and Gupta, Abhinav and Pathak, Deepak}, Title = {Neural Dynamic Policies for End-to-End Sensorimotor Learning}, Booktitle = {NeurIPS}, Year = {2020} }
	Skew-Fit: State-Covering Self-Supervised Reinforcement Learning Vitchyr H. Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl*, Sergey Levine ICML 2020 webpage \| pdf \| abstract \| bibtex \| arXiv \| code \| video \| Reinforcement learning can enable an agent to acquire a large repertoire of skills. However, each new skill requires a manually-designed reward function, which typically requires considerable manual effort and engineering. Self-supervised goal setting has the potential to automate this process, enabling an agent to propose its own goals and acquire skills that achieve these goals. However, such methods typically rely on manually-designed goal distributions or heuristics to encourage the agent to explore a wide range of states. In this work, we propose a formal objective for exploration when training an autonomous goal-reaching policy that maximizes state coverage, and show that this objective is equivalent to maximizing the entropy of the goal distribution together with goal reaching performance. We present an algorithm called Skew-Fit for learning such a maximum-entropy goal distribution, and show that our method converges to a uniform distribution over the set of possible states, even when we do not know this set beforehand. When combined with existing goal-conditioned reinforcement learning algorithms, we show that Skew-Fit allows self-supervised agents to autonomously explore their entire state space faster than prior work, across a variety of simulated and real robotic tasks. @inproceedings{bahl2020ndp, title={Skew-fit: State-covering self-supervised reinforcement learning}, author={Pong, Vitchyr H and Dalal, Murtaza and Lin, Steven and Nair, Ashvin and Bahl, Shikhar and Levine, Sergey}, journal={ICML}, year={2020} }
	Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards Gerrit Schoettler, Ashvin Nair, Jianlan Luo, Shikhar Bahl, Juan aparicio Ojea, Eugen Solowjow, Sergey Levine IROS 2020 webpage \| pdf \| abstract \| bibtex \| arXiv \| video We consider a variety of difficult industrial insertion tasks with visual inputs and different natural reward specifications, namely sparse rewards and goal images. We show that methods that combine RL with prior information, such as classical controllers or demonstrations, can solve these tasks from a reasonable amount of real-world interaction. @article{nair2020contextual, title={Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards}, author={Schoettler, Gerrit and Nair, Ashvin and Luo, Jianlan and Bahl, Shikhar and Ojea, Juan Aparicio and Solowjow, Eugen and Levine, Sergey}, booktitle={IROS}, year={2020}, }
	Contextual Imagined Goals for Self-Supervised Robotic Learning Ashvin Nair, Shikhar Bahl, Alexander Khazatsky*, Vitchyr H. Pong, Glen Berseth , Sergey Levine CoRL 2019 webpage \| pdf \| abstract \| bibtex \| arXiv \| code \| data \| video We propose a conditional goal-setting model that aims to only propose goals that are feasible reachable from the robot's current state, and demonstrate that this enables self-supervised goal-conditioned learning with raw image observations both in varied simulated environments and a real-world pushing task.. @article{nair2020contextual, title={Contextual imagined goals for self-supervised robotic learning}, author={Nair, Ashvin and Bahl, Shikhar and Khazatsky, Alexander and Pong, Vitchyr and Berseth, Glen and Levine, Sergey}, booktitle={CoRL}, year={2020}, }
	Residual Reinforcement Learning for Robot Control Tobias Johannink, Shikhar Bahl, Ashvin Nair*, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine ICRA 2019 webpage \| pdf \| abstract \| bibtex \| arXiv \| video Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects @inproceedings{johannink2019residual, title={Residual reinforcement learning for robot control}, author={Johannink, Tobias and Bahl, Shikhar and Nair, Ashvin and Luo, Jianlan and Kumar, Avinash and Loskyll, Matthias and Ojea, Juan Aparicio and Solowjow, Eugen and Levine, Sergey}, booktitle={ICRA}, year={2019}, } }
	Visual Reinforcement Learning with Imagined Goals Ashvin Nair, Vitchyr H. Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine NeurIPS 2018 (Spotlight) webpage \| pdf \| abstract \| bibtex \| arXiv \| code \| blog \| videos For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques. @article{nair2018visual, title={Visual reinforcement learning with imagined goals}, author={Nair, Ashvin V and Pong, Vitchyr and Dalal, Murtaza and Bahl, Shikhar and Lin, Steven and Levine, Sergey}, journal={NeurIPS}, year={2018} }

Teaching

EECS127 - Fall 2018 (uGSI)

Website template from this repo and this webpage!