人工智慧創新應用專題中心

Y.-F. Chen, F. Y.-S. Lin, S.-Y. Hsu, T.-L. Sun, Y. Huang, And C.-H. Hsiao

Adaptive Traffic Control: OpenFlow-Based Prioritization Strategies for Achieving High Quality of Service in Software-Defined Networking

IEEE Transactions on Network and Service Management

June 2025

This paper tackles key challenges in Software-Defined Networking (SDN) by proposing a novel approach for optimizing resource allocation and dynamic priority assignment using OpenFlows priority field. The proposed Lagrangian relaxation (LR)-based algorithms significantly reduces network delay, achieving performance management with dynamic priority levels while demonstrating adaptability and efficiency in a sliced network. The algorithms’ effectiveness were validated through computational experiments, highlighting the strong potential for QoS management across diverse industries. Compared to the Same Priority baseline, the proposed methods: RPA, AP–1, and AP–2, exhibited notable performance improvements, particularly under strict delay constraints. For future applications, the study recommends expanding the algorithm to handle larger networks, integrating it with artificial intelligence technologies for proactive resource optimization. Additionally, the proposed methods lay a solid foundation for addressing the unique demands of 6G networks, particularly in areas such as base station mobility (Low-Earth Orbit, LEO), ultra-low latency, and multi-path transmission strategies.

Ard Kastrati, Michael R. Winters, Nemanja Djuric, Ivan J. Tashev, Yu-Te Wang

Automated Feature Engineering for Single-Trial EEG and Eye-Tracking Classification in Predictive Text Interfaces

13th IEEE International Winter Conference on Brain-Computer Interface Conference 2025

January 2025

.

P.-Y. Huang, S.-W. Fu, And Y. Tsao

RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier

NeurIPS 2024, Poster Session

December 2024

State-of-the-art (SOTA) semi-supervised learning techniques, such as FixMatch and it's variants, have demonstrated impressive performance in classification tasks. However, these methods are not directly applicable to regression tasks. In this paper, we present RankUp, a simple yet effective approach that adapts existing semi-supervised classification techniques to enhance the performance of regression tasks. RankUp achieves this by converting the original regression task into a ranking problem and training it concurrently with the original regression objective. This auxiliary ranking classifier outputs a classification result, thus enabling integration with existing semi-supervised classification methods. Moreover, we introduce regression distribution alignment (RDA), a complementary technique that further enhances RankUp's performance by refining pseudo-labels through distribution alignment. Despite its simplicity, RankUp, with or without RDA, achieves SOTA results in across a range of regression benchmarks, including computer vision, audio, and natural language processing tasks.

C.-H. Hsiao, F. Y.-S. Lin, T.-L. Sun, Y.-Y. Liao, C.-H. Wu, Y.-C. Lai, H.-P. Wu, P.-R. Liu, B.-R. Xiao, C.-H. Chen, And Y. Huang

Precision and Robust Models on Healthcare Institution Federated Learning for Predicting HCC on Portal Venous CT Images

IEEE Journal of Biomedical and Health Informatics

August 2024

Hepatocellular carcinoma (HCC), the most common type of liver cancer, poses significant challenges in detection and diagnosis. Medical imaging, especially computed tomography (CT), is pivotal in non-invasively identifying this disease, requiring substantial expertise for interpretation. This research introduces an innovative strategy that integrates two-dimensional (2D) and three-dimensional (3D) deep learning models within a federated learning (FL) framework for precise segmentation of liver and tumor regions in medical images. The study utilized 131 CT scans from the Liver Tumor Segmentation (LiTS) challenge and demonstrated the superior efficiency and accuracy of the proposed Hybrid-ResUNet model with a Dice score of 0.9433 and an AUC of 0.9965 compared to ResNet and EfficientNet models. This FL approach is beneficial for conducting large-scale clinical trials while safeguarding patient privacy across healthcare settings. It facilitates active engagement in problem-solving, data collection, model development, and refinement. The study also addresses data imbalances in the FL context, showing resilience and highlighting local models' robust performance. Future research will concentrate on refining federated learning algorithms and their incorporation into the continuous implementation and deployment (CI/CD) processes in AI system operations, emphasizing the dynamic involvement of clients. We recommend a collaborative human-AI endeavor to enhance feature extraction and knowledge transfer. These improvements are intended to boost equitable and efficient data collaboration across various sectors in practical scenarios, offering a crucial guide for forthcoming research in medical AI.

S.-S. Wang, J.-Y. Chen, B.-R. Bai, S.-H. Fang, And Y. Tsao

Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks with Human-in-the-Loop Assessment Metrics

IEEE/ACM Transactions on Audio, Speech, and Language Processing

July 2024

The utilization of face masks is an essential healthcare measure, particularly during times of pandemics, yet it can present challenges in communication in our daily lives. To address this problem, we propose a novel approach known as the human-in-the-loop StarGAN (HL–StarGAN) face-masked speech enhancement method. HL–StarGAN comprises discriminator, classifier, metric assessment predictor, and generator that leverages an attention mechanism. The metric assessment predictor, referred to as MaskQSS, incorporates human participants in its development and serves as a “human-in-the-loop” module during the learning process of HL–StarGAN. The overall HL–StarGAN model was trained using an unsupervised learning strategy that simultaneously focuses on the reconstruction of the original clean speech and the optimization of human perception. To implement HL–StarGAN, we created a face-masked speech database named “FMVD,” which comprises recordings from 34 speakers in three distinct face-masked scenarios and a clean condition. We conducted subjective and objective tests on the proposed HL–StarGAN using this database. The outcomes of the test results are as follows: (1) MaskQSS successfully predicted the quality scores of face-masked voices, outperforming several existing speech assessment methods. (2) The integration of the MaskQSS predictor enhanced the ability of HL–StarGAN to transform face-masked voices into high-quality speech; this enhancement is evident in both objective and subjective tests, outperforming conventional StarGAN and CycleGAN-based systems.

S.-W. Fu, K.-H. Hung, Y. Tsao, And Y.-C. F. Wang

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

ICLR 2024, Poster Session

May 2024

Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variational autoencoder (VQ-VAE). The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted. To further improve correlation with real quality scores, domain knowledge of speech processing is incorporated into the model design. We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training. To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced. In summary, the proposed speech quality estimation method and enhancement models require only clean speech for training without any label requirements. Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines.

Cheng-Der Fuh*, Chuan-Ju Wang*, And Chen-Hung Pai

Markov Chain Importance Sampling for Minibatches

Machine Learning

January 2024

This study investigates importance sampling under the scheme of mini-batch stochastic gradient descent, under which the contributions are twofold. First, theoretically, we develop a neat tilting formula, which can be regarded as a general device for asymptotically optimal importance sampling. Second, practically, guided by the formula, we present an effective algorithm for importance sampling which accounts for the effects of minibatches and leverages the Markovian property of the gradients between iterations. Experiments conducted on artificial data confirm that our algorithm consistently delivers superior performance in terms of variance reduction. Furthermore, experiments carried out on real-world data demonstrate that our method, when paired with relatively straightforward models like multilayer perceptron (MLP) and convolutional neural networks (CNN), outperforms in terms of training loss and testing error.

Jia-Huei Ju, Yu-Shiang Huang, Cheng-Wei Lin, Che Lin, And Chuan-Ju Wang

A Compare-and-contrast Multistage Pipeline for Uncovering Financial Signals in Financial Reports

the 61st Annual Meeting of the Association for Computational Linguistics (ACL)

July 2023

In this paper, we address the challenge of discovering financial signals in narrative financial reports. As these documents are often lengthy and tend to blend routine information with new information, it is challenging for professionals to discern critical financial signals. To this end, we leverage the inherent nature of the year-to-year structure of reports to define a novel signal-highlighting task; more importantly, we propose a compare-and-contrast multistage pipeline that recognizes different relationships between the reports and locates relevant rationales for these relationships. We also create and publicly release a human-annotated dataset for our task. Our experiments on the dataset validate the effectiveness of our pipeline, and we provide detailed analyses and ablation studies to support our findings.

C.-C. Lee, Y. Tsao, H.-M. Wang, C.-S. Chen

D4AM: A General Denoising Framework for Downstream Acoustic Models

ICLR 2023, Poster Session

May 2023

D4AM: A General Denoising Framework for Downstream Acoustic Models Chi-Chang Lee , Yu Tsao , Hsin-Min Wang , Chu-Song Chen Published: 02 Feb 2023, Last Modified: 06 Mar 2023 ICLR 2023 poster Readers: Everyone Show Bibtex Show Revisions Keywords: audio processing, speech enhancement, robust automatic speech recognition, auxiliary task learning TL;DR: We propose a general denoising framework for various downstream acoustic models (D4AM) by adopting an effective joint training scheme with the regression (denoising) objective and the classification (ASR) objective. Abstract: The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noise-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoising framework, D4AM, for various downstream acoustic models. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. In addition, our method aims to consider the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients rather than undergoing a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. Specifically, when evaluated on the Google ASR API with real noisy data completely unseen during SE training, D4AM achieves a relative WER reduction of 24.65% compared with the direct feeding of noisy input. To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.

H.-H. Tseng, H.-Y. Lin, H.-K. Hsuan, And Y. Tsao

Interpretations of Domain Adaptations via Layer Variational Analysis

ICLR 2023, Poster Session

May 2023

Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with corresponding data conditions. Moreover, our theoretical calculation yields intuitive interpretations towards the knowledge transfer process. Subsequently, an alternative method for network-based transfer learning is derived. The method shows an increase in efficiency and accuracy for domain adaptation. It is particularly advantageous when new domain data is sufficiently sparse during adaptation. Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method.

研究成果