中央研究院資訊科技創新研究中心

Abstract

The evolution of immersive media has long centered on the pursuit of high fidelity by delivering pristine pixels for human consumption via XR devices, such as VR headsets, AR goggles, and smart glasses. In this talk, I will first outline our journey in mastering visual delivery, spanning from bandwidth-efficient Three Degrees of Freedom (3-DoF) 360° video streaming to state-of-the-art 6-DoF 3D Gaussian Splatting (3DGS) streaming. I will also highlight how we leverage human visual perception through gaze-adaptive foveated rendering to optimize cloud VR experiences. Beyond visual fidelity, emerging immersive applications increasingly demand systems that are not only bandwidth-efficient but also intelligence-aware. This shift is aligned with broader trends in next-generation networks, where communication is expected to move beyond raw data delivery toward more semantic- and task-oriented paradigms. Transitioning from pixels to semantics, I will share our recent explorations in vision understanding tailored for edge and wearable platforms. Specifically, we investigate how to exploit temporal correlations in Vision Large Language Models (VLLMs) for motion understanding, and examine the potential of LLM-based assistants to provide real-time spatial awareness for visually impaired users. By bridging efficient pixel delivery and semantic understanding, this talk highlights a key step toward future immersive systems that tightly integrate communication, computation, and intelligence, and points to new opportunities at the intersection of immersive media and next-generation network design.

Bio

Cheng-Hsin Hsu is a Professor in the Department of Computer Science at National Tsing Hua University, Taiwan. He received his Ph.D. from Simon Fraser University, Canada, his M.Eng. from the University of Maryland, College Park, USA, and his B.Sc. and M.Sc. degrees from National Chung-Cheng University, Taiwan. Cheng-Hsin's research interests include multimedia networking, immersive video, and the Internet of Things. He and his colleagues have received Best Demo and Best Paper Awards at premier venues including ACM Multimedia, ACM MMSys, ACM SIGCOMM EMS, IEEE SMARTCOMP, IEEE CloudCom, and IEEE RTAS. Cheng-Hsin has extensive experience in both academia and industry. He has served as a visiting scholar at Rutgers University, the University of California, Irvine, the University of Illinois at Urbana-Champaign, and the Qatar Computing Research Institute. Before joining academia, he was a Senior Research Scientist at Deutsche Telekom and held technical positions at Motorola and Lucent.

資訊科技創新研究中心

資訊科技創新研究中心

學術演講

From High-Fidelity to High-Intelligence: Spanning Pixels and Semantics in Immersive Media