In this talk, we will introduce our works on stereo depth estimation using deep learning. Recent work has shown that depth estimation from a stereo pair of images can be formulated as a supervised learning task to be resolved with convolutional neural networks (CNNs). However, current architectures rely on patch-based Siamese networks, lacking the means to exploit context information for finding correspondence in ill-posed regions. To tackle this problem, we propose PSMNet, a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN. To further improve results, we proposed context-aware filtering for disparity refinement. The main idea is that filter weights of context-aware filtering in each location are obtained from a guidance network. For practical application in real-time on mobile devices, we proposed multi-scale aggregation stereo network which can balance computation and accuracy at inference time.
Jia-Ren Chang received his PhD degree in Computer Science from National Chiao Tung University (NCTU) in 2019. Currently, Jia-Ren Chang is a postdoc in Computer Science at NCTU. His research interests involved in the area of stereo vision, object recognition, style transfer, facial expression recognition, biomedical signal processing, and cognitive neuroscience. His work has been on improving stereo depth estimation, the quality of stylization, facial attribute disentangling, and understanding human brain process, through the deep neural networks. He has made numerous contributions to stereo vision; and he proposed a novel deep network architecture for stereo depth estimation. His proposed method won the first place on the worldwide stereo vision benchmark (KITTI) during 2017.11~2018.3.