人工智慧創新應用專題中心

Chuan-Ju Wang*, Yu-Neng Chuang*, Chih-Ming Chen, And Ming-Feng Tsai

Skewness Ranking Optimization for Personalized Recommendation

Machine Learning

August 2020

In this paper, we propose a novel optimization criterion that leverages features of the skew normal distribution to better model the problem of personalized recommendation. Specifically, the developed criterion borrows the concept and the flexibility of the skew normal distribution, based on which three hyperparameters are attached to the optimization criterion. Furthermore, from a theoretical point of view, we not only establish the relation between the maximization of the proposed criterion and the shape parameter in the skew normal distribution, but also provide the analogies and asymptotic analysis of the proposed criterion to maximization of the area under the ROC curve. Experimental results conducted on a range of large-scale real-world datasets show that our model significantly outperforms the state of the art and yields consistently best performance on all tested datasets.

Boyu Lu, Jun-Cheng Chen, Rama Chellappa

UID-GAN: Unsupervised Image Deblurring via Disentangled Representations

IEEE Transactions on Biometrics, Behavior, and Identity Science

January 2020

Recent advances in deep convolutional neural networks (DCNNs) and generative adversarial networks (GANs) have significantly improved the performance of single image blind deblurring algorithms. However, most of the existing algorithms require paired training data. In this paper, we present an unsupervised method for single-image deblurring without paired training images. We introduce a disentangled framework to split the content and blur features of a blurred image, which yields improved deblurring performance. To handle the unpaired training data, a blurring branch and the cycle-consistency loss are added to guarantee that the content structures of the restored results match the original images. We also add a perceptual loss to further mitigate the artifacts. For natural image deblurring, we introduce a color loss to reduce color distortions in outputs. Extensive experiments on both domain-specific and natural image deblurring show the proposed method achieves competitive results compared to recent state-of-the-art deblurring approaches.

K.-C. Liu, M. Chan, C.-Y. Hsieh, H.-Y. Huang, C.-T. Chan, Y. Tsao

Domain-adaptive Fall Detection Using Deep Adversarial Training

IEEE Transactions on Neural Systems & Rehabilitation Engineering

June 2021

Chon Hou Sio, Yu-Jen Ma, Hong-Han Shuai, Jun-Cheng Chen, Wen-Huang Cheng

S^2SiamFC: Self-supervised Fully Convolutional Siamese Network for Visual Tracking

ACM Multimedia (MM), Poster Session

October 2020

To exploit rich information from unlabeled data, in this work, wepropose a novel self-supervised framework for visual trackingwhich can easily adapt the state-of-the-art supervised Siamese-based trackers into unsupervised ones by utilizing the fact thatan image and any cropped region of it can form a natural pairfor self-training. Besides common geometric transformation-baseddata augmentation and hard negative mining, we also propose ad-versarial masking which helps the tracker to learn other contextinformation by adaptively blacking out salient regions of the tar-get. The proposed approach can be trained offline using imagesonly without any requirement of manual annotations and tempo-ral information from multiple consecutive frames. Thus, it can beused with any kind of unlabeled data, including images and videoframes. For evaluation, we take SiamFC as the base tracker andname the proposed self-supervised method as푆2SiamFC. Extensiveexperiments and ablation studies on the challenging VOT2016 andVOT2018 datasets are provided to demonstrate the effectivenessof the proposed method which not only achieves comparable per-formance to its supervised counterpart and other unsupervisedmethods requiring multiple frames.

C.-L. Liu, S.-W. Fu, Y.-J. Li, J.-W. Huang, H.-M. Wang, And Y. Tsao

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks

IEEE/ACM Transactions on Audio, Speech, and Language Processing

February 2020

I n recent years, waveform-mapping-based speech enhancement (SE) methods have garnered significant attention. These methods generally use a deep learning model to directly process and reconstruct speech waveforms. Because both the input and output are in waveform format, the waveform-mapping-based SE methods can overcome the distortion caused by imperfect phase estimation, which may be encountered in spectral-mapping-based SE systems. So far, most waveform-mapping-based SE methods have focused on single-channel tasks. In this paper, we propose a novel fully convolutional network (FCN) with Sinc and dilated convolutional layers (termed SDFCN) for multichannel SE that operates in the time domain. We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN). The proposed methods are evaluated on three multichannel SE tasks, namely the dual-channel inner-ear microphones SE task, the distributed microphones SE task, and the CHiME-3 dataset. The experimental results confirm the outstanding denoising capability of the proposed SE systems on both tasks and the benefits of using the residual architecture on the overall SE performance.

Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa

A Dual Path Model With Adaptive Attention For Vehicle Re-Identification

International Conference in Computer Vision (ICCV) (oral)

October 2019

In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearance path captures macroscopic vehicle features while the orientation conditioned part appearance path learns to capture localized discriminative features by focusing attention on the most informative key-points. Through extensive experimentation, we show that the proposed AAVER method is able to accurately re-identify vehicles in unconstrained scenarios, yielding state of the art results on the challenging dataset VeRi-776. As a byproduct, the proposed system is also able to accurately predict vehicle key-points and shows an improvement of more than 7% over state of the art.

Jingxiao Zheng, Ruichi Yu, Jun-Cheng Chen, Boyu Lu, Carlos Castillo, Rama Chellappa

Uncertainty Modeling of Contextual-Connection between Tracklets for Unconstrained Video-based Face Recognition

International Conference in Computer Vision (ICCV), Poster Session

October 2019

Unconstrained video-based face recognition is a challenging problem due to significant within-video variations caused by pose, occlusion and blur. To tackle this problem, an effective idea is to propagate the identity from high-quality faces to low-quality ones through contextual connections, which are constructed based on context such as body appearance. However, previous methods have often propagated erroneous information due to lack of uncertainty modeling of the noisy contextual connections. In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph. UGG explicitly models the uncertainty of the contextual connections by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results in the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark dataset.

Boyu Lu, Jun-Cheng Chen, Rama Chellappa

Unsupervised Domain-Specific Deblurring via Disentangled Representations

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Poster Session

June 2019

Image deblurring aims to restore the latent sharp images from the corresponding blurred ones. In this paper, we present an unsupervised method for domain-specific single-image deblurring based on disentangled representations. The disentanglement is achieved by splitting the content and blur features in a blurred image using content encoders and blur encoders. We enforce a KL divergence loss to regularize the distribution range of extracted blur attributes such that little content information is contained. Meanwhile, to handle the unpaired training data, a blurring branch and the cycle-consistency loss are added to guarantee that the content structures of the deblurred results match the original images. We also add an adversarial loss on deblurred results to generate visually realistic images and a perceptual loss to further mitigate the artifacts. We perform extensive experiments on the tasks of face and text deblurring using both synthetic datasets and real images, and achieve improved results compared to recent state-of-the-art deblurring methods.

Boyu Lu, Jun-Cheng Chen, Carlos D Castillo, Rama Chellappa

An Experimental Evaluation of Covariates Effects on Unconstrained Face Verification

IEEE Transactions on Biometrics, Behavior, and Identity Science

January 2019

Covariates are factors that have a debilitating influence on face verification performance. In this paper, we comprehensively study two covariate related problems for unconstrained face verification: first, how covariates affect the performance of deep neural networks on the large-scale unconstrained face verification problem; second, how to utilize covariates to improve verification performance. To study the first problem, we implement five state-of-the-art deep convolutional networks and evaluate them on three challenging covariates datasets. In total, seven covariates are considered: pose (yaw and roll), age, facial hair, gender, indoor/outdoor, occlusion (nose and mouth visibility, and forehead visibility), and skin tone. We first report the performance of each individual network on the overall protocol and use the score-level fusion method to analyze each covariate. Some of the results confirm and extend the findings of previous studies, and others are new findings that were rarely mentioned previously or did not show consistent trends. For the second problem, we demonstrate that with the assistance of gender information, the quality of a precurated noisy large-scale face dataset for face recognition can be further improved. After retraining the face recognition model using the curated data, performance improvement is observed at low false acceptance rates.

Chun-Hsiang Wang, Kang-Chun Fan, Chuan-Ju Wang, And Ming-Feng Tsai

UGSD: User Generated Sentiment Dictionaries from Online Customer Reviews

Machine Learning

January 2019

Customer reviews on platforms such as TripAdvisor and Amazon provide rich information about the ways that people convey sentiment on certain domains. Given these kinds of user reviews, this paper proposes UGSD, a representation learning framework for constructing domain-specific sentiment dictionaries from online customer reviews, in which we leverage the relationship between user-generated reviews and the ratings of the reviews to associate the reviewer sentiment with certain entities. The proposed framework has the following three main advantages. First, no additional annotations of words or external dictionaries are needed for the proposed framework; the only resources needed are the review texts and entity ratings. Second, the framework is applicable across a variety of user-generated content from different domains to construct domain-specific sentiment dictionaries. Finally, each word in the constructed dictionary is associated with a low-dimensional dense representation and a degree of relatedness to a certain rating, which enable us to obtain more fine-grained dictionaries and enhance the application scalability of the constructed dictionaries as the word representations can be adopted for various tasks or applications, such as entity ranking and dictionary expansion. The experimental results on three real-world datasets show that the framework is effective in constructing high-quality domain-specific sentiment dictionaries from customer reviews.

研究成果