資訊科技創新研究中心 | 近期研究成果

Hsuan-An Hsia, Che-Hsien Lin, Bo-Han Kung, Jhao-Ting Chen, Daniel Stanley Tan, Jun-Cheng Chen, Kai-Lung Hua

CLIPCAM: A Simple Baseline for Zero-shot Text-guided Object and Action Localization

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

May 2022

The key for the contemporary deep learning-based object and action localization algorithms to work is the large-scale annotated data. However, in real-world scenarios, since there are infinite amounts of unlabeled data beyond the categories of publicly available datasets, it is not only time- and manpower-consuming to annotate all the data but also requires a lot of computational resources to train the detectors. To address these issues, we show a simple and reliable baseline that can be easily obtained and work directly for the zero-shot text-guided object and action localization tasks without introducing additional training costs by using Grad-CAM, the widely used class visual saliency map generator, with the help of the recently released Contrastive Language-Image Pre-Training (CLIP) model by OpenAI, which is trained contrastively using the dataset of 400 million image-sentence pairs with rich cross-modal information between text semantics and image appearances. With extensive experiments on the Open Images and HICO-DET datasets, the results demonstrate the effectiveness of the proposed approach for the text-guided unseen object and action localization tasks for images.

B.-J. Chen And R. Y. Chang

Few-Shot Transfer Learning for Device-Free Fingerprinting Indoor Localization

IEEE International Conference on Communications (ICC)

May 2022

Device-free wireless indoor localization is an essential technology for the Internet of Things (IoT), and fingerprint-based methods are widely used. A common challenge to fingerprint-based methods is data collection and labeling. This paper proposes a few-shot transfer learning system that uses only a small amount of labeled data from the current environment and reuses a large amount of existing labeled data previously collected in other environments, thereby significantly reducing the data collection and labeling cost for localization in each new environment. The core method lies in graph neural network (GNN) based few-shot transfer learning and its modifications. Experimental results conducted on real-world environments show that the proposed system achieves comparable performance to a convolutional neural network (CNN) model, with 40 times fewer labeled data.

C.-H. Kuo, H.-Y. Chang, R. Y. Chang, And W.-H. Chung

Unsupervised Learning Based Hybrid Beamforming with Low-Resolution Phase Shifters for MU-MIMO Systems

IEEE International Conference on Communications (ICC)

May 2022

Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based scheme to jointly design the analog precoder and combiner with low-resolution PSs for multiuser multiple-input multiple-output (MU-MIMO) systems. We transform the analog precoder and combiner design problem into a phase classification problem and propose a generic neural network architecture, termed the phase classification network (PCNet), capable of producing solutions of various PS resolutions. Simulation results demonstrate the superior sum-rate and complexity performance of the proposed scheme, as compared to state-of-the-art hybrid beamforming designs for the most commonly used low-resolution PS configurations.

Wan-Yu Chen, Hsin-Yuan Chang, Chih-Yu Wang, Wei-Ho Chung

Cooperative Neighboring Vehicle Positioning Systems Based on Graph Convolutional Network: A Multi-Scenario Transfer Learning Approach

IEEE International Conference on Communications (ICC)

May 2022

Vehicle positioning is a key component of autonomous driving. The global positioning system (GPS) is the most commonly used vehicle positioning system currently. However, its accuracy will be affected by environmental differences and thus fails to meet the requirements of meter-level accuracy. We consider a coordinate neighboring vehicle positioning system (CNVPS) based on GPS, omnidirectional radar, and V2V communication ability to obtain additional information from neighboring vehicles to improve the GPS positioning accuracy of vehicles in various environments. We further use the concept of transfer learning (TL) wherein an adversarial mechanism is designed to eliminate the deviation of multiple environments to optimize vehicle positioning accuracy in multiple environments using one model. The simulation results show that, compared with the existing methods, the proposed system architecture not only improves the performance but also effectively reduces the amount of data required for training.

Ting-Hsiang Wang*, Hsiu-Wei Yang*, Chih-Ming Chen, Ming-Feng Tsai, And Chuan-Ju Wang

Item Concept Network: Towards Concept-based Item Representation Learning

IEEE Transactions on Knowledge and Data Engineering

March 2022

Item concept modeling is commonly achieved by leveraging textual information. However, many existing models do not leverage the inferential property of concepts to capture word meanings, which therefore ignores the relatedness between correlated concepts, a phenomenon which we term conceptual “correlation sparsity.” In this paper, we distinguish between word modeling and concept modeling and propose an item concept modeling framework centering around the item concept network (ICN). ICN models and further enriches item concepts by leveraging the inferential property of concepts and thus addresses the correlation sparsity issue. Specifically, there are two stages in the proposed framework: ICN construction and embedding learning. In the first stage, we propose a generalized network construction method to build ICN, a structured network which infers expanded concepts for items via matrix operations. The second stage leverages neighborhood proximity to learn item and concept embeddings. With the proposed ICN, the resulting embedding facilitates both homogeneous and heterogeneous tasks, such as item-to-item and concept-to-item retrieval, and delivers related results which are more diverse than traditional keyword-matching-based approaches. As our experiments on two real-world datasets show, the framework encodes useful conceptual information and thus outperforms traditional methods in various item classification and retrieval tasks.

Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Simsekli, Yi-Hsuan Yang, And Gael Richard

Relative positional encoding for Transformers with linear complexity

Proc. International Conference on Machine Learning (ICML)

July 2021

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

Chia-Yuan Chang*, Cheng-Wei Lu*, Chuan-Ju Wang

A Multi-step-ahead Markov Conditional Forward Model with Cube Perturbations for Extreme Weather Forecasting

Machine Learning

February 2021

Predicting extreme weather events such as tropical and extratropical cyclones is of vital scientific and societal importance. Of late, machine learning methods have found their way to weather analysis and prediction, but mostly, these methods use machine learning merely as a complement to traditional numerical weather prediction models. Although some pure machine learning and data-driven approaches for weather prediction have been developed, they mainly formulate the problem similar to pattern recognition or follow the train of thought of traditional time-series models for extreme weather event forecasting; for the former, this usually yields only single-step ahead prediction, and for the latter, this lacks the flexibility to account for observed weather features as such methods concern only the patterns of the extreme weather occurrences. In this paper, we depart from the typical practice of pattern recognition and time-series approaches and focus on employing machine learning to estimate the probabilities of extreme weather occurrences in a multi-step-ahead (MSA) fashion given information on both weather features and the realized occurrences of extreme weather. Specifically, we propose a Markov conditional forward (MCF) model that adopts the Markov property between the occurrences of extreme weather for MSA extreme weather forecasting. Moreover, for better long-term prediction, we propose three novel cube perturbation methods to address error accumulation in our model. Experimental results on a real-world extreme weather dataset show the superiority of the proposed MCF model in terms of prediction accuracy for both short-term and long-term forecasting; moreover, the three cube perturbation methods successfully increase the fault tolerance and generalization ability of the MCF model, yielding significant improvements for long-term prediction.

K. M. Chen And R. Y. Chang

Semi-Supervised Learning with GANs for Device-Free Fingerprinting Indoor Localization

IEEE Global Communications Conference (GLOBECOM)

December 2020

Device-free wireless indoor localization is a key enabling technology for the Internet of Things (IoT). Fingerprint-based indoor localization techniques are a commonly used solution. This paper proposes a semi-supervised, generative adversarial network (GAN)-based device-free fingerprinting indoor localization system. The proposed system uses a small amount of labeled data and a large amount of unlabeled data (i.e., semi-supervised), thus considerably reducing the expensive data labeling effort. Experimental results show that, as compared to the state-of-the-art supervised scheme, the proposed semi-supervised system achieves comparable performance with equal, sufficient amount of labeled data, and significantly superior performance with equal, highly limited amount of labeled data. Besides, the proposed semi-supervised system retains its performance over a broad range of the amount of labeled data. The interactions between the generator, discriminator, and classifier models of the proposed GAN-based system are visually examined and discussed. A mathematical description of the proposed system is also presented.

Chih-Kai Kang, Hashan Roshantha Mendis, Chun-Han Lin, Ming-Syan Chen And Pi-Cheng Hsiu

Everything Leaves Footprints: Hardware Accelerated Intermittent Deep Inference

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

November 2020

Current peripheral execution approaches for intermittently-powered systems require full access to the internal hardware state for checkpointing or rely on application-level energy estimation for task partitioning to make correct forward progress. Both requirements present significant practical challenges for energy-harvesting, intelligent edge IoT devices, which perform hardware accelerated DNN inference. Sophisticated compute peripherals may have inaccessible internal state, and the complexity of DNN models makes it difficult for programmers to partition the application into suitably sized tasks that fit within an estimated energy budget. This paper presents the concept of inference footprinting for intermittent DNN inference, where accelerator progress is accumulatively preserved across power cycles. Our middleware stack, HAWAII, tracks and restores inference footprints efficiently and transparently to make inference forward progress, without requiring access to the accelerator internal state and application-level energy estimation. Evaluations were carried out on a Texas Instruments device, under varied energy budgets and network workloads. Compared to a variety of task-based intermittent approaches, HAWAII improves the inference throughput by 5.7% to 95.7%, particularly achieving higher performance on heavily accelerated DNNs.

Yu-Neng Chuang*, Chih-Ming Chen*, Chuan-Ju Wang, Ming-Feng Tsai, Yuan Fang, And Ee Peng Lim

TPR: Text-aware Preference Ranking for Recommender Systems

Machine Learning

October 2020

Textual data is common and informative auxiliary information for recommender systems. Most prior art utilizes text for rating predic- tion, but rare work connects it to top- N recommendation. Moreover, although advanced recommendation models capable of incorporating auxiliary information have been developed, none of these are specifically designed to model textual information, yielding a limited usage scenario for typical user-to-item recommendation. In this work, we present a framework of text-aware preference ranking (TPR) for top- N recommendation, in which we comprehensively model the joint association of user-item interaction and relations between items and associated text. Using the TPR framework, we construct a joint likelihood function that explicitly describes two ranking structures: 1) item preference ranking (IPR) and 2) word relatedness ranking (WRR), where the former captures the item preference of each user and the latter captures the word relatedness of each item. As these two explicit structures are by nature mutually dependent, we propose TPR-OPT, a simple yet effective learning criterion that additionally includes implicit structures, such as relatedness between items and relatedness between words for each user for model optimization. Such a design not only successfully describes the joint association among users, words, and text comprehensively but also naturally yields powerful representations that are suitable for a range of recommendation tasks, including user-to-item, item-to-item, and user-to-word recommendation, as well as item-to-word reconstruction. In this paper, extensive experiments have been conducted on eight recommendation datasets, the results of which demonstrate that by including textual information from item descriptions, the proposed TPR model consistently outperforms state-of-the-art baselines on various recommendation tasks.