Skip to main content Skip to secondary navigation

Articulated Objects? I’ll let my robot figure it out!

Main content start

Authors: Claire Chen (Stanford), Nick Heppert (Stanford), Yijia Weng (Stanford) Kaichun Mo (Stanford), Brent Yi (Stanford), Toki Migimatsu (Stanford), Jeannette Bohg (Stanford), Leonidas Guibas (Stanford), Jeremy Ma (TRI)

two images of TRI Robot in kitchen
Figure 1: Left - In a kitchen with many articulated objects, a household robot estimates grasp locations (blue), joint types (green arrows) and joint axes (red dotted lines) of a refrigerator, oven, and drawer. Right - The robot opens a refrigerator door to retrieve an item. Photo credit: TRI

In our ongoing collaboration with Toyota Research Institute, we are focussed on developing solutions that bring us closer to deploying assistive household robots in any home. Our goal is to design robots to help users with everyday tasks such as cleaning, retrieving items from cabinets, and unloading dishwashers. Performing such tasks requires robots to interact with common household objects such as doorways, cabinets, and appliances such as coffee-makers, dishwashers or refrigerators, as shown in Figure 1. All these objects contain movable parts essential to their function, such as doors or drawers, and often have knobs, buttons, or handles to facilitate operation. In the robotics community, such objects are referred to as articulated objects, as they have at least one internal degree of freedom enabled by movable joints. 

While articulated objects are commonplace in every household, their appearance, geometry, and kinematics vary greatly, making each home a new and challenging environment for robots. In order for deployed robots to successfully interact with articulated objects in these unstructured environments, we must develop methods that work with as little prior knowledge as possible and are robust to the vast degrees of variation across objects.

Interacting with articulated objects is a complex task that has many parts, which we break down into the following three components:

  1. Grasp generation: First, upon approaching an object, a robot must decide where and how it should grasp the object. For example, in Figure 2a, to open the cabinet door, a robot must realize it needs to grasp the protruding handle, highlighted in green

  2. Compliant control: Once the robot has successfully grasped the object, it must execute actions to move the object. In Figure 2b, when opening the door, the robot must conform to the way the door moves (compliance), to avoid breaking itself or the door. 

  1. Object pose tracking and estimation: Finally, to move the object to some desired configuration, the robot must use visual data to estimate the joint type and current state of the object. In Figure 2c, as the robot opens the door, it must use images to estimate the joint type and angle of the door. We tackle this problem under two different settings:

    1. Category-level: A single perception model generalizes to all different object instances from a known category

    2. Category-independent: A single perception model applies to articulated objects across different categories.  

These components can be connected to close the loop between control and estimation, forming the perception-action loop shown in Figure 2, which gives the robot real-time feedback of how well the task is going. After grasping the object, the robot continuously estimates the object state during interaction and updates the controller based on this feedback to move the object to a goal state. Finally, to aid the design of generalizable computer vision and manipulation methods, we develop a set of 3D assets comprising the PartNet-Mobility Dataset, which contains a collection of 2000 common articulated objects with motion annotations and rendering assets [1,2]. For more details on each of these components, click through to the post here.

Diagram of perception-action loop, with (a) grasp generation, (b) compliant control, and (c) object tracking and estimation
Figure 2: Diagram of perception-action loop, with (a) grasp generation, (b) compliant control, and (c) object tracking and estimation. After obtaining a grasp location and pose, the robot continuously updates its actions and estimations as it interacts with the object.

Our work in grasp generation, compliant control, and object tracking for articulated objects bring us closer to deploying assistive robots in any home. We are excited to continue collaborating with TRI to discover research breakthroughs that will make assistive home robots a reality.

You can check out more of our work at the following links: