Articulated objects? I’ll Let My Robot Figure That Out!

two images of TRI Robot in kitchen — Figure 1: Left - In a kitchen with many articulated objects, a household robot estimates grasp locations (blue), joint types (green arrows) and joint axes (red dotted lines) of a refrigerator, oven, and drawer. Right - The robot opens a refrigerator door to retrieve an item. Photo credit: TRI

In Stanford's ongoing collaboration with Toyota Research Institute (TRI), our research focuses on deploying assistive household robots to help people with everyday tasks such as cleaning, retrieving items from cabinets, and unloading dishwashers. These tasks require robots to interact with articulated objects such as doors, drawers, cabinets, coffee-makers, refrigerators, etc., which have at least one moving part and which may also have knobs, buttons, or handles.

While articulated objects are commonplace in every household, their appearance, geometry, and kinematics vary greatly, making each home a new and challenging environment for robots. In order for deployed robots to successfully interact with articulated objects in these unstructured environments, we must develop methods that work with as little prior knowledge as possible and are robust to the vast degrees of variation across objects.

Interacting with articulated objects is a complex task that has many parts, which we break down into the following three components:

Grasp generation: First, upon approaching an object, a robot must decide where and how it should grasp the object. For example, in Figure 2a, to open the cabinet door, a robot must realize it needs to grasp the protruding handle, highlighted in green
Compliant control: Once the robot has successfully grasped the object, it must execute actions to move the object. In Figure 2b, when opening the door, the robot must conform to the way the door moves (compliance), to avoid breaking itself or the door.

Object pose tracking and estimation: Finally, to move the object to some desired configuration, the robot must use visual data to estimate the joint type and current state of the object. In Figure 2c, as the robot opens the door, it must use images to estimate the joint type and angle of the door. We tackle this problem under two different settings:
1. Category-level: A single perception model generalizes to all different object instances from a known category
2. Category-independent: A single perception model applies to articulated objects across different categories.

These components can be connected to close the loop between control and estimation, forming the perception-action loop shown in Figure 2, which gives the robot real-time feedback of how well the task is going. After grasping the object, the robot continuously estimates the object state during interaction and updates the controller based on this feedback to move the object to a goal state. Finally, to aid the design of generalizable computer vision and manipulation methods, we develop a set of 3D assets comprising the PartNet-Mobility Dataset, which contains a collection of 2000 common articulated objects with motion annotations and rendering assets [1,2]. You can find more details on each of these components here.

Figure 2: Diagram of perception-action loop, with (a) grasp generation, (b) compliant control, and (c) object tracking and estimation. After obtaining a grasp location and pose, the robot continuously updates its actions and estimations as it interacts with the object.

Our work in grasp generation, compliant control, and object tracking for articulated objects bring us closer to deploying assistive robots in any home. We are excited to continue collaborating with TRI to discover research breakthroughs that will make assistive home robots a reality.

You can check out more of our work at the following links:

Articulated objects? I’ll Let My Robot Figure That Out!

More News Topics

More News

Immersing Drivers in Virtual Reality to Test Shared Control Concepts

Revisiting the "Video" in Video-Language Understanding