Abstract
Intelligent/autonomous vehicles, such as self-driving cars, intelligent robots and Unmanned Aerial Vehicles (UAVs) must seamlessly interact with humans, e.g., their drivers/operators/pilots or people in their vicinity, whether being obstacles to be avoided (e.g., pedestrians) or targets to be followed and interact with (e.g., when filming a performing athelete). Furthermore, intelligent vehicles and robots have been increasingly employed to assist humans in real-world applications (e.g., for , autonomous transportation, warehouse logistics, or infrastructure inspection) To this end, autonomous vehicles should be equipped with advanced vision systems that allow them to understand and interact with humans in their surrounding environment.
This lecture overviews human-centric AI methods that can be utilized to facilitate visual interaction between humans and autonomous vehicles (e.g., through gestures captured by RGB cameras), in order to ensure their safe and successful cooperation in real-world scenarios. Such methods should: a) demonstrate increased visual perception accuracy to understand human visual cues, b) be robust to input data variations, in order to successfully handle illumination/background/scale changes that are typically encountered in real-world scenarios, and c) produce timely predictions to ensure safety, which is a critical aspect of autonomous vehicles’ applications. Deep learning and neural networks play an important role towards this end, covering the following topics: a) human pose/posture estimation from RGB images, b) human action/activity recognition recognition from RGB images/skeleton data, and c) gesture recognition from RGB images/skeleton data. Finally, embedded execution is extremely important, as it facilitates vehicle autonomy, e.g., in communication-denied environments. Application areas include driver/operator/pilot activity recognition, gesture-based control of autonomous vehicles, or gesture recognition for traffic management. The lecture will offer an overview of all the above plus other related topics and will stress the related algorithmic aspects. Some issues on embedded CNN computation (e.g., through fast convolution algorithms) will be overviewed as well.
Figure: Gesture language for drone control.
Figure: Pedestrian and car region segmentation.
Human-centered-AI-for-autonomous-vehicles-v2.0