人工智慧創新應用專題中心

Li-Chen Cheng, Shu-Chuan Hsu, Pin-Hua Lee, Hsiu-Chieh Lee, Che-Hsien Lin, Jun-Cheng Chen, Chih-Yu Wang

KinStyle: A Strong Baseline Photorealistic Kinship Face Synthesis with An Optimized StyleGAN Encoder

Asian Conference on Computer Vision (ACCV)

December 2022

High-fidelity kinship face synthesis is a challenging task due to the limited amount of kinship data available for training and low-quality images. In addition, it is also hard to trace the genetic traits between parents and children from those low-quality training images. To address these issues, we leverage the pre-trained state-of-the-art face synthesis model, StyleGAN2, for kinship face synthesis. To handle large age, gender and other attribute variations between the parents and their children, we conduct a thorough study of its rich latent spaces and different encoder architectures for an optimized encoder design to repurpose StyleGAN2 for kinship face synthesis. The obtained latent representation from our developed encoder pipeline with stage-wise training strikes a better balance of editability and synthesis fidelity for identity preserving and attribute manipulations than other compared approaches. With extensive subjective, quantitative, and qualitative evaluations, the proposed approach consistently achieves better performance in terms of facial attribute heredity and image generation fidelity than other compared state-of-the-art methods. This demonstrates the effectiveness of the proposed approach which can yield promising and satisfactory kinship face synthesis using only a single and straightforward encoder architecture.

C. H. Hsiao, P. C. Lin, L. A. Chung, F. Y. S. Lin, F. J. Yang, S. Y. Yang, C. H. Wu, Y. Huang, And T. L. Sun

A Deep Learning-based Precision and Automatic Kidney Segmentation System Using Efficient Feature Pyramid Networks in Computed Tomography Images

Computer Methods and Programs in Biomedicine

June 2022

This paper proposes an encoder-decoder architecture for kidney segmentation. A hyperparameter optimization process is implemented, including the development of a model architecture, selecting a windowing method and a loss function, and data augmentation. The model consists of EfficientNet-B5 as the encoder and a feature pyramid network as the decoder that yields the best performance with a Dice score of 0.969 on the 2019 Kidney and Kidney Tumor Segmentation Challenge dataset. The proposed model is tested with different voxel spacing, anatomical planes, and kidney and tumor volumes. Moreover, case studies are conducted to analyze segmentation outliers. Finally, five-fold cross-validation and the 3D-IRCAD-01 dataset are used to evaluate the developed model in terms of the following evaluation metrics: the Dice score, recall, precision, and the Intersection over Union score. A new development and application of artificial intelligence algorithms to solve image analysis and interpretation will be demonstrated in this paper. Overall, our experiment results show that the proposed kidney segmentation solutions in CT images can be significantly applied to clinical needs to assist surgeons in surgical planning. It enables the calculation of the total kidney volume for kidney function estimation in ADPKD and supports radiologists or doctors in disease diagnoses and disease progression.

C. H. Hsiao, T. L. Sun, P. C. Lin, T. Y. Peng, Y. H. Chen, C. Y. Cheng, F. J. Yang, S. Y. Yang, C. H. Wu, F. Y. S. Lin, And Y. Huang

A Deep Learning-based Precision Volume Calculation Approach for Kidney and Tumor Segmentation on Computed Tomography Images

Computer Methods and Programs in Biomedicine

June 2022

Previously, doctors interpreted computed tomography (CT) images based on their experience in diagnosing kidney diseases. However, with the rapid increase in CT images, such interpretations were required considerable time and effort, producing inconsistent results. Several novel neural network models were proposed to automatically identify kidney or tumor areas in CT images for solving this problem. In most of these models, only the neural network structure was modified to improve accuracy. However, data pre-processing was also a crucial step in improving the results. This study systematically discussed the necessary pre-processing methods before processing medical images in a neural network model. The experimental results were shown that the proposed pre-processing methods or models significantly improve the accuracy rate compared with the case without data pre-processing. Specifically, the dice score was improved from 0.9436 to 0.9648 for kidney segmentation and 0.7294 for all types of tumor detections. The performance was suitable for clinical applications with lower computational resources based on the proposed medical image processing methods and deep learning models. The cost efficiency and effectiveness were also achieved for automatic kidney volume calculation and tumor detection accurately.

R.-Y. Tseng, T.-W. Wang, S.-W. Fu, C.-Y. Lee, And Y. Tsao

A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation

IEEE Transactions on Cognitive and Developmental Systems

December 2021

Timmy S. T. Wan, Jun-Cheng Chen, Tzer-Yi Wu, Chu-Song Chen

Continual Learning for Visual Search with Backward Consistent Feature Embedding

IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), Poster Session

June 2022

In visual search, the gallery set could be incrementally growing and added to the database in practice. However, In visual se In visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setups arch, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setups existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setupsIn visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach thatIn visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- In visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setups sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach undeIn visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setupsr a wide range of setups can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of In visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Be- sides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible feature space, imposing a high computational cost for a large gallery set. To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. We enforce the losses of inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setups inter-session data coherence, neighbor-session model co- herence, and intra-session discrimination to conduct a con- tinual learner. In addition to the disjoint setup, our CL so- lution also tackles the situation of increasingly adding new classes for the blurry boundary without assuming all cat- egories known in the beginning and during model update. To our knowledge, this is the first CL method both tackling the issue of backward-consistent feature embedding and al- lowing novel classes to occur in the new sessions. Extensive experiments on various benchmarks show the efficacy of our approach under a wide range of setups

Hsuan-An Hsia, Che-Hsien Lin, Bo-Han Kung, Jhao-Ting Chen, Daniel Stanley Tan, Jun-Cheng Chen, Kai-Lung Hua

CLIPCAM: A Simple Baseline for Zero-shot Text-guided Object and Action Localization

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

May 2022

The key for the contemporary deep learning-based object and action localization algorithms to work is the large-scale annotated data. However, in real-world scenarios, since there are infinite amounts of unlabeled data beyond the categories of publicly available datasets, it is not only time- and manpower-consuming to annotate all the data but also requires a lot of computational resources to train the detectors. To address these issues, we show a simple and reliable baseline that can be easily obtained and work directly for the zero-shot text-guided object and action localization tasks without introducing additional training costs by using Grad-CAM, the widely used class visual saliency map generator, with the help of the recently released Contrastive Language-Image Pre-Training (CLIP) model by OpenAI, which is trained contrastively using the dataset of 400 million image-sentence pairs with rich cross-modal information between text semantics and image appearances. With extensive experiments on the Open Images and HICO-DET datasets, the results demonstrate the effectiveness of the proposed approach for the text-guided unseen object and action localization tasks for images.

Ting-Hsiang Wang*, Hsiu-Wei Yang*, Chih-Ming Chen, Ming-Feng Tsai, And Chuan-Ju Wang

Item Concept Network: Towards Concept-based Item Representation Learning

IEEE Transactions on Knowledge and Data Engineering

March 2022

Item concept modeling is commonly achieved by leveraging textual information. However, many existing models do not leverage the inferential property of concepts to capture word meanings, which therefore ignores the relatedness between correlated concepts, a phenomenon which we term conceptual “correlation sparsity.” In this paper, we distinguish between word modeling and concept modeling and propose an item concept modeling framework centering around the item concept network (ICN). ICN models and further enriches item concepts by leveraging the inferential property of concepts and thus addresses the correlation sparsity issue. Specifically, there are two stages in the proposed framework: ICN construction and embedding learning. In the first stage, we propose a generalized network construction method to build ICN, a structured network which infers expanded concepts for items via matrix operations. The second stage leverages neighborhood proximity to learn item and concept embeddings. With the proposed ICN, the resulting embedding facilitates both homogeneous and heterogeneous tasks, such as item-to-item and concept-to-item retrieval, and delivers related results which are more diverse than traditional keyword-matching-based approaches. As our experiments on two real-world datasets show, the framework encodes useful conceptual information and thus outperforms traditional methods in various item classification and retrieval tasks.

Chia-Yuan Chang*, Cheng-Wei Lu*, Chuan-Ju Wang

A Multi-step-ahead Markov Conditional Forward Model with Cube Perturbations for Extreme Weather Forecasting

Machine Learning

February 2021

Predicting extreme weather events such as tropical and extratropical cyclones is of vital scientific and societal importance. Of late, machine learning methods have found their way to weather analysis and prediction, but mostly, these methods use machine learning merely as a complement to traditional numerical weather prediction models. Although some pure machine learning and data-driven approaches for weather prediction have been developed, they mainly formulate the problem similar to pattern recognition or follow the train of thought of traditional time-series models for extreme weather event forecasting; for the former, this usually yields only single-step ahead prediction, and for the latter, this lacks the flexibility to account for observed weather features as such methods concern only the patterns of the extreme weather occurrences. In this paper, we depart from the typical practice of pattern recognition and time-series approaches and focus on employing machine learning to estimate the probabilities of extreme weather occurrences in a multi-step-ahead (MSA) fashion given information on both weather features and the realized occurrences of extreme weather. Specifically, we propose a Markov conditional forward (MCF) model that adopts the Markov property between the occurrences of extreme weather for MSA extreme weather forecasting. Moreover, for better long-term prediction, we propose three novel cube perturbation methods to address error accumulation in our model. Experimental results on a real-world extreme weather dataset show the superiority of the proposed MCF model in terms of prediction accuracy for both short-term and long-term forecasting; moreover, the three cube perturbation methods successfully increase the fault tolerance and generalization ability of the MCF model, yielding significant improvements for long-term prediction.

Yu-Neng Chuang*, Chih-Ming Chen*, Chuan-Ju Wang, Ming-Feng Tsai, Yuan Fang, And Ee Peng Lim

TPR: Text-aware Preference Ranking for Recommender Systems

Machine Learning

October 2020

Textual data is common and informative auxiliary information for recommender systems. Most prior art utilizes text for rating predic- tion, but rare work connects it to top- N recommendation. Moreover, although advanced recommendation models capable of incorporating auxiliary information have been developed, none of these are specifically designed to model textual information, yielding a limited usage scenario for typical user-to-item recommendation. In this work, we present a framework of text-aware preference ranking (TPR) for top- N recommendation, in which we comprehensively model the joint association of user-item interaction and relations between items and associated text. Using the TPR framework, we construct a joint likelihood function that explicitly describes two ranking structures: 1) item preference ranking (IPR) and 2) word relatedness ranking (WRR), where the former captures the item preference of each user and the latter captures the word relatedness of each item. As these two explicit structures are by nature mutually dependent, we propose TPR-OPT, a simple yet effective learning criterion that additionally includes implicit structures, such as relatedness between items and relatedness between words for each user for model optimization. Such a design not only successfully describes the joint association among users, words, and text comprehensively but also naturally yields powerful representations that are suitable for a range of recommendation tasks, including user-to-item, item-to-item, and user-to-word recommendation, as well as item-to-word reconstruction. In this paper, extensive experiments have been conducted on eight recommendation datasets, the results of which demonstrate that by including textual information from item descriptions, the proposed TPR model consistently outperforms state-of-the-art baselines on various recommendation tasks.

Pirazh Khorramshahi, Neehar Peri, Jun-Cheng Chen, Rama Chellappa

The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification

European Conference on Computer Vision (ECCV), Poster Session

August 2020

In recent years, the research community has approached the problem of vehicle re-identification (re-id) with attention-based models, specifically focusing on regions of a vehicle containing discriminative information. These re-id methods rely on expensive key-point labels, part annotations, and additional attributes including vehicle make, model, and color. Given the large number of vehicle re-id datasets with various levels of annotations, strongly-supervised methods are unable to scale across different domains. In this paper, we present Self-supervised Attention for Vehicle Re-identification (SAVER), a novel approach to effectively learn vehicle-specific discriminative features. Through extensive experimentation, we show that SAVER improves upon the state-of-the-art on challenging VeRi, VehicleID, Vehicle-1M and VERI-Wild datasets.

研究成果