資訊科技創新研究中心 | 近期研究成果

W.-S. Tseng, L.-K. Hsu, L.-W. Kang, And Y.-C. F. Wang

Cross-View Action Recognition via Low-Rank based Domain Adaptation

IEEE International Conference on Image Processing (ICIP)

September 2013

Cross-view action recognition is a challenging problem, since one typically does not have sufficient training data at the target view of interest. With recent developments of domain adaptation, we propose a novel low-rank based domain adaptation model for mapping labeled data from the original source view to the target view, so that training and testing can be performed at that domain. Our model not only provides an effective way for associating image data across different domains, we further advocate the structural incoherence between transformed data of different categories. As a result, additional data discriminating ability is introduced to our domain adaptation model, and thus improved recognition can be expected. Experimental results on the IXMAS dataset verify the effectiveness of our proposed method, which is shown to outperform state-of-the-art domain adaptation approaches.

Y.-C. Chou, C.-P. Wei, Y.-C. F. Wang

A Discriminative Domain Adaptaion Model for Cross-Domain Image Classification

IEEE International Conference on Image Processing (ICIP)

September 2013

Techniques of domain adaptation have been applied to address cross-domain recognition problems. In particular, such techniques favor the scenarios in which labeled data can be obtained at the source domain, but only few labeled target domain data are available during the training stage. In this paper, we propose a domain adaptation approach which is able to transfer source domain labeled data to the target domain, so that one can collect a sufficient amount of training data at that domain for recognition purposes. By advancing low-rank matrix decomposition for obtaining representative cross-domain data, our proposed model aims at transferring source domain labeled data to the target domain while preserving class label information. This introduces additional discriminating ability into our model, and thus improved recognition can be expected. Empirical results on cross-domain image datasets confirm the use of our proposed model for solving cross-domain recognition problems.

Liwei Chan, Rong-Hao Liang, Ming-Chang Tsai, Kai-Yin Cheng, Chao-Huai Su, Mike Chen, Wen-Huang Cheng, And Bing-Yu Chen

FingerPad: Private and Subtle Interaction Under Fignertips

ACM Symposium on User Interface Software and Technology (UIST)

October 2013

FingerPad: Private and Subtle Interaction Under Fignertips

W.-T. Li, H.-S. Chang, K.-C. Lien, H.-T. Chang, And Y.-C. F. Wang

Exploring Visual and Motion Saliency for Automatic Video Object Extraction

IEEE Transactions on Image Processing

July 2013

This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the use of any training data (i.e., not limited to any particular type of object). To separate foreground and background regions within and across video frames, the proposed method utilizes visual and motion saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory results.

Y.-J. Lee, Y.-R. Yeh, And Y.-C. F. Wang

Anomaly Detection via Online Over-Sampling Principal Component Analysis

IEEE Transactions on Knowledge and Data Engineering

July 2013

Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this paper, we propose an online oversampling principal component analysis (osPCA) algorithm to address this problem, and we aim at detecting the presence of outliers from a large amount of data via an online updating technique. Unlike prior principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems. By oversampling the target instance and extracting the principal direction of the data, the proposed osPCA allows us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen analysis explicitly, the proposed framework is favored for online applications which have computation or memory limitations. Compared with the well-known power method for PCA and other popular anomaly detection algorithms, our experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency.

L.-K. Hsu, W.-S. Tseng, L.-W. Kang, And Y.-C. F. Wang

Seeing Through the Expression: Bridging the Gap Between Expression and Emotion Recognition

IEEE International Conference on Multimedia & Expo (ICME), Theme Track for Multimedia for Humanity

July 2013

In this paper, we propose a novel approach for visualizing and recognizing different emotion categories using facial expression images. Extended by the unsupervised nonlinear dimension reduction technique of locally linear embedding (LLE), we propose a supervised LLE (sLLE) algorithm utilizing emotion labels of face expression images. While existing works typically aim at training on such labeled data for emotion recognition, our approach allows one to derive subspaces for visualizing facial expression images within and across different emotion categories, and thus emotion recognition can be properly performed. In our work, we relate the resulting two-dimensional subspace to the valence-arousal emotion space, in which our method is observed to automatically identify and discriminate emotions in different degrees. Experimental results on two facial emotion datasets verify the effectiveness of our algorithm. With reduced numbers of feature dimensions (2D or beyond), our approach is shown to achieve promising emotion recognition performance.

C.-P. Wei And Y.-C. F. Wang

Learning Auxiliary Dictionaries for Undersampled Face Recognition

IEEE International Conference on Multimedia & Expo (ICME)

July 2013

In this paper, we address the problem of robust face recognition using undersampled data. Given only one or few face images per class, our proposed method not only handles test images with large intra-class variations such as illumination and expression, it is also able to recognize the corrupted ones due to occlusion or disguise. In our work, we advocate the learning of auxiliary dictionaries from the subjects \emph{not} of interest. With the proposed optimization algorithm which jointly solves the tasks of auxiliary dictionary learning and sparse-representation based face recognition, our approach is able to model the above intra-class variations and corruptions for improved recognition. Our experiments on two face image datasets confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art sparse representation based methods.

I-Wei Lai, Chia-Han Lee, And Kwang-Cheng Chen

A virtual MIMO path-time code for cognitive ad hoc networks

IEEE Communications Letters

January 2013

Cognitive radio networks (CRNs) consisting of opportunistic links greatly elevate the networking throughput per bandwidth in future wireless communications. In this letter, we show that by encoding a data packet into several coded packets and transmitting in various time slots, the end-to-end transmission in the CRN can be equivalently formulated as a physical-layer multiple-input multiple-output ( MIMO ) communication problem. Two coding schemes are proposed in such a virtual MIMO scenario to enhance the end-to-end communication reliability by exploiting the path diversity. The closed-form expression of the theoretical error rate analysis validates that the proposed path - time codes can significantly improve the error rate performance.

Hsin-Yi Chen, Yen-Yu Lin, And Bing-Yu Chen

Robust Feature Matching with Alternate Hough and Inverted Hough Transforms

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Poster Session

June 2013

We present an algorithm that carries out alternate Hough transform and inverted Hough transform to establish feature correspondences, and enhances the quality of matching in both precision and recall. Inspired by the fact that nearby features on the same object share coherent homographies in matching, we cast the task of feature matching as a density estimation problem in the Hough space spanned by the hypotheses of homographies. Specifically, we project all the correspondences into the Hough space, and determine the correctness of the correspondences by their respective densities. In this way, mutual verification of relevant correspondences is activated, and the precision of matching is boosted. On the other hand, we infer the concerted homographies propagated from the locally grouped features, and enrich the correspondence candidates for each feature. The recall is hence increased. The two processes are tightly coupled. Through iterative optimization, plausible enrichments are gradually revealed while more correct correspondences are detected. Promising experimental results on three benchmark datasets manifest the effectiveness of the proposed approach.

K.-H. Lo, K.-L. Hua, And Y.-C. F. Wang

Depth Map Super-Resolution via Markov Random Fields Without Texture-Copying Artifacts

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

May 2013

The use of time-of-flight sensors enables the record of fullframe depth maps at video frame rate, which benefits a variety of 3D image or video processing applications. However, such depth maps are typically corrupted by noise and with limited resolution. In this paper, we present a learning-based depth map super-resolution framework by solving a MRF labeling optimization problem. With the captured depth map and the associated high-resolution color image, our proposed method exhibits the capability of preserving the edges of range data while suppressing the artifacts of texture copying due to color discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.