中央研究院資訊科技創新研究中心

Abstract

Collecting data on a large scale is vital for developing cutting-edge artificial intelligence (AI) technologies, when they involve machine learning (ML) models such as deep neural networks that require to be trained using relevant data. On the one hand, collecting data of the real world, using cameras or microphones would allow AIs to better understand our everyday life and ultimately to behave naturally as we humans do or to help us in a natural fashion. On the other hand, due to growing concerns about security and privacy, it is becoming increasingly difficult to collect such real data. This presentation aims to discuss a computational framework for real-data collection and learning, which effectively leverages a collection of AI models for self-navigating mobile robots. We will particularly focus on developing visual perception models that can see the real world through a camera -- as they play a pivotal role for a variety of AI-powered products and services, such as autonomous vehicles and smart cities, and are the main area of research that Elsa Lab has been contributing to. Visual perception models based on deep neural networks have achieved unprecedented accuracies in benchmark datasets. Installing such models would enable edge AIs to better perceive and understand their surrounding environments and act intelligently in the real world. However, they usually suffer from accuracy drops and insufficiency of effective data samples from the real world, leading to unsatisfactory performance and safety concerns in practical deployments. To address the above-mentioned problems, we explore and incorporate the following key technologies into a framework: virtual-to-real transfer, semantic segmentation based unsupervised domain adaptation (UDA), and mid-level representations. Specifically, virtual-to-real transfer allows ML models to be trained in simulated environments first and migrated to the real world setting with ease. Semantic segmentation based unsupervised domain adaptation (UDA) further enables the above model migration process to become possible, even under practical and challenging scenarios where data collection in the real world involves laborintensive manual preprocessing costs. Furthermore, mid-level representations are used to deliver various types of information from the perception module to the control module, and form the basis of modular frameworks for many learning-based systems. The main scientific challenge of this research direction is to integrate them into a unified solution, and improve the adaptation ability of AI models in the real world.

資訊科技創新研究中心

資訊科技創新研究中心

學術演講

TIGP (AIoT) -- Challenges of Virtual-to-Real Learning for Deep Learning Based Intelligent Robotics