Data-efficient Algorithms for Machine Learning with Provable Guarantees

Principal Investigator:

TRI Liaison:

Project Summary

We aim to develop large-scale ML algorithms that can a) train faster and generalize better, b) generalize better in the heavy tail of the data, and c) transfer to new domains with fewer labeled examples.

Data-efficient ML algorithms can reduce the cost of collecting human labels and improve the prediction accuracy of TRI products. Vision models on autonomous cars have to generalize quickly to rare events (e.g., animal crossing or children running on the road to catch a ball) and new environments (e.g., a small town in rural areas) without much human supervision. Understanding the rare events is critical for safety features on the cars.

Research Goals

Understand the generalization of over-parameterized neural networks.
Develop algorithms that can both train faster and generalize better.
Improve the generalization performance of the rare examples in the heavy-tail.
Develop new methodologies for domain adaption that can handle structured domain shift.
Apply methods above to real-world classification and segmentation tasks with strong performance boost.