Autonomous vehicles, such as cars and robots, need to understand the behaviors of people in their surroundings for safety and planning purposes. Understanding context is also important to provide in-vehicle drivers and passengers effective driving assistance and in-car device interaction and experiences. We propose a comprehensive approach to detect and predict behaviors of people at multiple semantic granularities, and for those outside who interact with the vehicle. First, people near cars may be performing a variety of actions and activities which involve certain contextual components, such as motions and object interactions, and which occur within long, continuous streams of video. We will analyze actions by incorporating knowledge of objects, scenes, people and their relationships into a mechanism that can be applied to detect actions and their spatiotemporal extent in long video sequences. Second, we will study more subtle behaviors of in-vehicle drivers and passengers. More detailed semantic cues, such as body and head pose, gaze and gestures, are critical to unveil higher level semantics, such as user engagement, attentional focus, and intention. We will analyze these high-level aspects by adopting an integrative approach that exploits mid-level cues, such as poses and gestures, and introduce cognitive models of the person.
- Integrate contextual cues about objects, scenes and human body poses into behavior analysis.
- Localize human actions in long, untrimmed videos.
- Predict driver’s actions and detect driver’s distractions