資訊科技創新研究中心 | 近期研究成果

Y.-R. Yeh And Y.-C. F. Wang

A Rank-One Update Method for Least Squares Linear Discriminant Analysis with Concept Drift

Pattern Recognition

May 2013

Linear discriminant analysis (LDA) is a popular supervised dimension reduction algorithm, which projects the data into an effective low-dimensional linear subspace while the separation between the projected data from different classes is improved. While this subspace is typically determined by solving a generalized eigenvalue decomposition problem, its high computation costs prohibit the use of LDA especially when the scale and the dimensionality of the data are large. Based on the recent success of least squares LDA (LSLDA), we propose a novel rank-one update method with a simplified class indicator matrix. Using the proposed algorithm, we are able to derive the LSLDA model efficiently. Moreover, our LSLDA model can be extended to address the learning task of concept drift, in which the recently received data exhibit with gradual or abrupt changes in distribution. In other words, our LSLDA is able to observe and model the data distribution changes, while the dependency on outdated data will be suppressed. This proposed LSLDA will benefit applications of streaming data classification or mining, and it can recognize data with newly added class labels during the learning process. Experimental results on both synthetic and real datasets (with and without concept drift) confirm the effectiveness of our propose LSLDA.

Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, And Shyh-Kang Jeng

The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval

ACM Int. Conf. Multimedia (MM)

November 2012

One of the most exciting but challenging endeavors in music research is to develop a computational model that comprehends the affective content of music signals and organizes a music collection according to emotion. In this paper, we propose a novel acoustic emotion Gaussians (AEG) model that defines a proper generative process of emotion perception in music. As a generative model, AEG permits easy and straightforward interpretations of the model learning processes. To bridge the acoustic feature space and music emotion space, a set of latent feature classes, which are learned from data, is introduced to perform the end-to-end semantic mappings between the two spaces. Based on the space of latent feature classes, the AEG model is applicable to both automatic music emotion annotation and emotion-based music retrieval. To gain insights into the AEG model, we also provide illustrations of the model learning process. A comprehensive performance study is conducted to demonstrate the superior accuracy of AEG over its predecessors, using two emotion annotated music corpora MER60 and MTurk. Our results show that the AEG model outperforms the state-of-the-art methods in automatic music emotion annotation. Moreover, for the first time a quantitative evaluation of emotion-based music retrieval is reported.

Pi-Cheng Hsiu, Cheng-Kang Hsieh, Der-Nien Lee, And Tei-Wei Kuo

Multilayer Bus Optimization for Real-Time Embedded Systems

IEEE Transactions on Computers

November 2012

A major challenge in the design of multicore embedded systems is how to tackle the communications among tasks with performance requirements and precedence constraints. In this paper, we consider the problem of scheduling real-time tasks over multilayer bus systems with the objective of minimizing the communication cost. We show that the problem is NP-hard and determine the best possible approximation ratio of approximation algorithms. First, we propose a polynomial-time optimal algorithm for a restricted case where one multilayer bus, and the unit execution time and communication time are considered. The result is then extended as a pseudopolynomial-time optimal algorithm to consider multiple multilayer buses with arbitrary execution and communication times, as well as different timing constraints and objective functions. We compare the performance of the proposed algorithm with that of some popular heuristics, and provide further insights into the multilayer bus system design.

Ming-Fang Weng, Yen-Yu Lin, Nick C. Tang, And Hong-Yuan Mark Liao

Visual Knowledge Transfer among Multiple Cameras for People Counting with Occlusion Handling

ACM International Conference on Multimedia (MM)

October 2012

We present a framework to count the number of people in an environment where multiple cameras with different angles of view are available. We consider the visual cues captured by each camera as a knowledge source, and carry out cross-camera knowledge transfer to alleviate the difficulties of people counting, such as partial occlusions, low-quality images, clutter backgrounds, and so on. Specifically, this work can distinguish itself with the following contributions. First, we overcome the variations of multiple heterogeneous cameras with different perspective settings by matching the same groups of pedestrians taken by these cameras, and present an algorithm for accomplishing cross-camera correspondence. Second, the proposed counting model is composed of a pair of collaborative regressors. While one regressor measures the people count by features extracted from the intra-camera visual evidences, the other recovers the yielded residual by taking the conflicts among inter-camera predictions into account. The two regressors are elegantly coupled, and jointly lead to an accurate counting system. Besides, we provide a set of manually annotated pedestrian labels on PETS 2010 videos for performance evaluation. Our approach is comprehensively tested in various settings and compared with competitive baselines. The significant improvement in performance manifests the effectiveness of the proposed approach.

Yi-Hsuan Yang And Xiao Hu

Cross-cultural music mood classification: A comparison of English and Chinese songs

International Society for Music Information Retrieval Conference (ISMIR)

October 2012

Most existing studies on music mood classification have been focusing on Western music while little research has investigated whether mood categories, audio features, and classification models developed from Western music are applicable to non-Western music. This paper attempts to answer this question through a comparative study on English and Chinese songs. Specifically, a set of Chinese pop songs were annotated using an existing mood taxonomy developed for English songs. Six sets of audio features commonly used on Western music (e.g., timbre, rhythm) were extracted from both Chinese and English songs, and mood classification performances based on these feature sets were compared. In addition, experiments were conducted to test the generalizability of classification models across English and Chinese songs. Results of this study shed light on cross-cultural applicability of research results on music mood classification.

Ya-Ju Yu, Pi-Cheng Hsiu, Ai-Chun Pang

Energy-Efficient Video Multicast in 4G Wireless Systems

IEEE Transactions on Mobile Computing

October 2012

Layer-based video coding, together with adaptive modulation and coding, is a promising technique for providing real-time video multicast services on heterogeneous mobile devices. With the rapid growth of data communications for emerging applications, reducing the energy consumption of mobile devices is a major challenge. This paper addresses the problem of resource allocation for video multicast in fourth-generation wireless systems, with the objective of minimizing the total energy consumption for data reception. First, we consider the problem when scalable video coding is applied. We prove that the problem is NP-hard and propose a 2-approximation algorithm to solve it. Then, we investigate the problem under multiple description coding, and show that it is also NP-hard and cannot be approximated in polynomial time with a ratio better than 2, unless P=NP. To solve this case, we develop a pseudopolynomial time 2-approximation algorithm. The results of simulations conducted to compare the proposed algorithms with a brute-force optimal algorithm and a conventional approach are very encouraging.

H.-T. Hwang, Yu Tsao, H.-M. Wang, Y.-R. Wang, And S.-H. Chen

A Study of Mutual Information for GMM-Based Spectral Conversion

Interspeech 2012

September 2012

The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted sound). It is believed that enhancing this dependency can make the converted sound closer to the natural sound. To this end, we propose an integrated maximum a posteriori and mutual information (MAPMI) criterion for parameter generation on spectral conversion. Experimental results demonstrate that the quality of converted speech by the proposed MAPMI method outperforms that by the conventional method in terms of formal listening test.

F.-L. He, Y.-C. F. Wang And K.-L. Hua

A Self-Learning Approach to Color Demosaicking via Support Vector Regression

IEEE International Conference on Image Processing (ICIP), Poster Session

September 2012

Most digital cameras capture one primary color at each pixel by a single sensor overlaid with a color filter array. To re- cover a full color image from incomplete color samples, one needs to restore the two missing color values for each pixel. This restoration process is known as color demosaicking. In this paper, we present a novel self-learning approach to this problem via support vector regression. Unlike prior learning-based demosaicking methods, our approach aims at extract- ing image-dependent information in constructing the learning model, and we do not require any additional training data. Experimental results show that our proposed method outperforms many state-of-the-art techniques in both subjective and objective image quality measures.

Chin-An Lin, Yen-Yu Lin, Hong-Yuan Mark Liao, And Shyh-Kang Jeng

Action Recognition using Instance-specific and Class-consistent Cues

IEEE International Conference on Image Processing (ICIP), Poster Session

September 2012

We aim to resolve the difficulties of action recognition arising from the large intra-class variations. These unfavorable variations make it infeasible to represent one action instance by other ones of the same action. We hence propose to extract both instance-specific and class-consistent features to facilitate action recognition. Specifically, the instance-specific features explore the self-similarities among frames of each video instance, while class-consistent features summarize withinclass similarities. We introduce a generative formulation to combine the two diverse types of features. The experimental results demonstrate the effectiveness of our approach.

D.-A. Huang, L.-W. Kang, M.-C. Yang, C.-W. Lin, And Y.-C. F. Wang

Context-Aware Single Image Rain Removal

IEEE International Conference on Multimedia & Expo (ICME)

July 2012

Rain removal from a single image is one of the challenging image denoising problems. In this paper, we present a learning-based framework for single image rain removal, which focuses on the learning of context information from an input image, and thus the rain patterns present in it can be automatically identified and removed. We approach the single image rain removal problem as the integration of image decomposition and self-learning processes. More precisely, our method first performs context-constrained image segmentation on the input image, and we learn dictionaries for the highfrequency components in different context categories via sparse coding for reconstruction purposes. For image regions with rain streaks, dictionaries of distinct context categories will share common atoms which correspond to the rain patterns. By utilizing PCA and SVM classifiers on the learned dictionaries, our framework aims at automatically identifying the common rain patterns present in them, and thus we can remove rain streaks as particular high-frequency components from the input image. Different from prior works on rain removal from images/videos which require image priors or training image data from multiple frames, our proposed selflearning approach only requires the input image itself, which would save much pre-training effort. Experimental results demonstrate the subjective and objective visual quality improvement with our proposed method.