資訊科技創新研究中心 | 近期研究成果

Hashan Roshantha Mendis, Chih-Kai Kang, Chun-Han Lin, Ming-Syan Chen, And Pi-Cheng Hsiu

Deep Reorganization: Retaining Residuals in TinyML

IEEE/ACM Design Automation Conference (DAC)

June 2024

Designing intelligent, tiny devices with limited memory is immensely challenging, exacerbated by the additional memory requirement of residual connections in deep neural networks. In contrast to existing approaches that eliminate residuals to reduce peak memory usage at the cost of significant accuracy degradation, this paper presents DERO, which reorganizes residual connections by leveraging insights into the types and interdependencies of operations across residual connections. Evaluations were conducted across diverse model architectures designed for common computer vision applications. DERO consistently achieves peak memory usage comparable to plain-style models without residuals, while closely matching the accuracy of the original models with residuals.

Cheng-Der Fuh*, Chuan-Ju Wang*, And Chen-Hung Pai

Markov Chain Importance Sampling for Minibatches

Machine Learning

January 2024

This study investigates importance sampling under the scheme of mini-batch stochastic gradient descent, under which the contributions are twofold. First, theoretically, we develop a neat tilting formula, which can be regarded as a general device for asymptotically optimal importance sampling. Second, practically, guided by the formula, we present an effective algorithm for importance sampling which accounts for the effects of minibatches and leverages the Markovian property of the gradients between iterations. Experiments conducted on artificial data confirm that our algorithm consistently delivers superior performance in terms of variance reduction. Furthermore, experiments carried out on real-world data demonstrate that our method, when paired with relatively straightforward models like multilayer perceptron (MLP) and convolutional neural networks (CNN), outperforms in terms of training loss and testing error.

B.-J. Chen, R. Y. Chang, F.-T. Chien, And H. V. Poor

Graph Neural Network-Based Joint Beamforming for Hybrid Relay and Reconfigurable Intelligent Surface Aided Multiuser Systems

IEEE Wireless Communications Letters

October 2023

This study examines a downlink multiple-input single-output (MISO) system, where a base station (BS) with multiple antennas sends data to multiple single-antenna users with the help of a reconfigurable intelligent surface (RIS) and a half-duplex decode-and-forward (DF) relay. The system's sum rate is maximized through joint optimization of active beamforming at the BS and DF relay and passive beamforming at the RIS. The conventional alternating optimization algorithm for handling this complex design problem is suboptimal and computationally intensive. To overcome these challenges, this letter proposes a two-phase graph neural network (GNN) model that learns the joint beamforming strategy by exchanging and updating relevant relational information embedded in the graph representation of the transmission system. The proposed method demonstrates superior performance compared to existing approaches, robustness against channel imperfections and variations, generalizability across varying user numbers, and notable complexity advantages.

Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, And Yuh-Jye Lee

Federated Learning for Sparse Principal Component

2023 IEEE International Conference on Big Data (Big Data)

December 2023

In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the ℓ1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.

Ming-Chuan Yang, Guo-Wei Wong, Meng Chang Chen

Sparse Grid Imputation Using Unpaired Imprecise Auxiliary Data: Theory and Application to PM2.5 Estimation

ACM Transactions on Knowledge Discovery from Data

November 2023

Sparse grid imputation (SGI) is a challenging problem, as its goal is to infer the values of the entire grid from a limited number of cells with values. Traditionally, the problem is solved using regression methods such as KNN and kriging, whereas in the real world, there is often extra information---usually imprecise---that can aid inference and yield better performance. In the SGI problem, in addition to the limited number of fixed grid cells with precise target domain values, there are contextual data and imprecise observations over the whole grid. To solve this problem, we propose a distribution estimation theory for the whole grid and realize the theory {via the composition architecture of the Target-Embedding and the Contextual CycleGAN} trained with contextual information and imprecise observations. Contextual CycleGAN is structured as two generator-discriminator pairs and uses different types of contextual loss to guide the training. We consider the real-world problem of fine-grained PM2.5 inference with realistic settings: a few (less than 1$\%$) grid cells with precise PM2.5 data and all grid cells with contextual information concerning weather and imprecise observations from satellites and microsensors. The task is to infer reasonable values for all grid cells. As there is no ground truth for empty cells, out-of-sample MSE (mean squared error) and JSD (Jensen--Shannon divergence) measurements are used in the empirical study. The results show that Contextual CycleGAN supports the proposed theory and outperforms the methods used for comparison.

Guo-Wei Wong, Yi-Ting Huang, Ying-Ren Guo, Yeali Sun, Meng Chang Chen

Attention-Based API Locating for Malware Techniques

IEEE Transactions on Information Forensics & Security

November 2023

This paper presents APILI, an innovative approach to behavior-based malware analysis that utilizes deep learning to locate the API calls corresponding to discovered malware techniques in dynamic execution traces. APILI defines multiple attentions between API calls, resources, and techniques, incorporating MITRE ATT\&CK framework, adversary tactics, techniques and procedures, through a neural network. We employ fine-tuned BERT for arguments/resources embedding, SVD for technique representation, and several design enhancements, including layer structure and noise addition, to improve the locating performance. To the best of our knowledge, this is the first attempt to locate low-level API calls that correspond to high-level malicious behaviors (that is, techniques). Our evaluation demonstrates that APILI outperforms other traditional and machine learning techniques in both technique discovery and API locating. These results indicate the promising performance of APILI, thus allowing it to reduce the analysis workload.

Cheng-Wei Ching, Jia-Ming~Chang, Jian-Jhih~Kuo, Chih-Yu~Wang

Dual-Objective Personalized Federated Service System with Partially-labeled Data over Wireless Networks

IEEE Transactions on Services Computing

September 2023

Federated learning (FL) emerges to mitigate the privacy concerns in machine learning-based services and applications, and personalized federated learning (PFL) evolves to alleviate the issue of data heterogeneity. However, FL and PFL usually rest on two assumptions: the users’ data is well-labeled, or the personalized goals align with sufficient local data. Unfortunately, the two assumptions may not hold in most cases, where data labeling is costly, or most users have no sufficient local data to satisfy their personalized needs. To this end, we first formulate the problem, DoLP, that studies the issue of insufficient and partially-labeled data on FL-based services. DoLP aims to maximize two service objectives: 1) personalized classification objective and 2) the personalized labeling objective for each user within the constraint of training time over wireless networks. Then, we propose a PFL-based service system DoFed-SPP to solve DoLP. The DoFed-SPP's novelty is two-fold. First, we devise an inference-based first-order approximation metric, similarity ratio, to identify the similarity between users’ local data. Second, we design an approximation algorithm to determine the appropriate size and set of users for uploading in each round. Extensive experiments show DoFed-SPP outperforms the state-of-the-art in final accuracy and time-to-accuracy performance on CIFAR10/100 and DBPedia.

Chih-Hsuan Yen, Hashan Roshantha Mendis, Tei-Wei Kuo And Pi-Cheng Hsiu

Keep in Balance: Runtime-reconfigurable Intermittent Deep Inference

ACM Transactions on Embedded Computing Systems

September 2023

Intermittent deep neural network (DNN) inference is a promising technique to enable intelligent applications on tiny devices powered by ambient energy sources. Nonetheless, intermittent execution presents inherent challenges, primarily involving accumulating progress across power cycles and having to refetch volatile data lost due to power loss in each power cycle. Existing approaches typically optimize the inference configuration to maximize data reuse. However, we observe that such a fixed configuration may be significantly inefficient due to the fluctuating balance point between data reuse and data refetch caused by the dynamic nature of ambient energy. This work proposes DynBal, an approach to dynamically reconfigure the inference engine at runtime. DynBal is realized as a middleware plugin that improves inference performance by exploring the interplay between data reuse and data refetch to maintain their balance with respect to the changing level of intermittency. An indirect metric is developed to easily evaluate an inference configuration considering the variability in intermittency, and a lightweight reconfiguration algorithm is employed to efficiently optimize the configuration at runtime. We evaluate the improvement brought by integrating DynBal into a recent intermittent inference approach that uses a fixed configuration. Evaluations were conducted on a Texas Instruments device with various network models and under varied intermittent power strengths. Our experimental results demonstrate that DynBal can speed up intermittent inference by 3.26 times, achieving a greater improvement for a large network under high intermittency and a large gap between memory and computation performance.

Y.-C. Chuang, W.-Y. Chiu, R. Y. Chang, And Y.-C. Lai

Deep Reinforcement Learning for Energy Efficiency Maximization in Cache-Enabled Cell-Free Massive MIMO Networks: Single- and Multi-Agent Approaches

IEEE Transactions on Vehicular Technology

August 2023

Cell-free massive multiple-input multiple-output (CF-mMIMO) is an emerging beyond fifth-generation (5G) technology that improves energy efficiency (EE) and removes cell structure limitation by using multiple access points (APs). This study investigates the EE maximization problem. Forming proper cooperation clusters is crucial when optimizing EE, and it is often done by selecting AP–user pairs with good channel quality or aligning AP cache contents with user requests. However, the result can be suboptimal if we determine the clusters based solely on either aspect. This motivates our joint design of user association and content caching. Without knowing the user content preferences in advance, two deep reinforcement learning (DRL) approaches, i.e., single-agent reinforcement learning (SARL) and multi-agent reinforcement learning (MARL), are proposed for different scenarios. The SARL approach operates in a centralized manner which has lower computational requirements on edge devices. The MARL approach requires more computation resources at the edge devices but enables parallel computing to reduce the computation time and therefore scales better than the SARL approach. The numerical analysis shows that the proposed approaches outperformed benchmark algorithms in terms of network EE in a small network. In a large network, the MARL yielded the best EE performance and its computation time was reduced significantly by parallel computing.

T.-Y. Kan, R. Y. Chang, F.-T. Chien, B.-J. Chen, And H. V. Poor

Hybrid Relay and Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems

IEEE Transactions on Vehicular Technology

June 2023

Reconfigurable intelligent surfaces (RISs) are viewed as key enablers for next-generation wireless communications. This paper investigates a multiuser downlink multiple-input single-output (MISO) system in which a multiantenna base station (BS) transmits information to multiple single-antenna users with the aid of both a half-duplex decode-and-forward (DF) relay and a full-duplex RIS. Active beamforming at the BS and the DF relay, as well as passive beamforming at the RIS, are jointly designed for system sum-rate maximization. The design problem is challenging to solve due to coupled beamforming variables. An alternating optimization (AO) based algorithm is proposed to tackle this complex co-design problem. Numerical results demonstrate the superior performance of the proposed hybrid relay–RIS system with a judicious joint beamforming design. Convergence and complexity analysis shows that the convergence rate of the proposed algorithm is dominated by the numbers of users and RIS elements, and the proposed scheme can converge in a few iterations even in the configuration of large numbers of users and RIS elements. Interesting tradeoffs posed in the joint design are discussed. An extension of the proposed design method to a related energy efficiency (EE) optimization problem is also outlined and implemented.