Heterogeneous face recognition (HFR) is considered to be more and more important nowadays. However, even though common subspace learning serves as an effective techniques of HFR, we could not simply trust the common subspace constructed by external people since the huge infra-person differences. In this paper, we proposed a person-specific domain adaptation model for each testing image. Our model combines with common subspace constructed by eliminating the heterogeneous components of images in different domains. In our experiment, we take CUFS database and NIR-VIS 2.0 database for evaluation, and it shows high effectiveness of our proposed model.
Hearing-impaired patients have limited hearing dynamic range for speech perception, which partially accounts for their poor speech understanding abilities, particularly in noise. Wide dynamic range compression aims to compress speech signal into the usable hearing dynamic range of hearing-impaired listeners; however, it normally uses a static compression based strategy. This work proposed a strategy to continuously adjust the envelope compression ratio for speech processing in cochlear implants. This adaptive envelope compression (AEC) strategy aims to keep the compression processing as close to linear as possible, while still confine the compressed amplitude envelope within the pre-set dynamic range. Vocoder simulation experiments showed that, when narrowed down to a small dynamic range, the intelligibility of AEC-processed sentences was significantly better than those processed by static envelope compression. This makes the proposed AEC strategy a promising way to improve speech recognition performance for implanted patients in the future.
For the task of robust face recognition, we particularly focus on the scenario in which training and test image data are corrupted due to occlusion or disguise. Prior standard face recognition methods like Eigenfaces or state-of-the-art approaches such as sparse representation-based classification did not consider possible contamination of data during training, and thus their recognition performance on corrupted test data would be degraded. In this paper, we propose a novel face recognition algorithm based on low-rank matrix decomposition to address the aforementioned problem. Besides the capability of decomposing raw training data into a set of representative bases for better modeling the face images, we introduce a constraint of structural incoherence into the proposed algorithm, which enforces the bases learned for different classes to be as independent as possible. As a result, additional discriminating ability is added to the derived base matrices for improved recognition performance. Experimental results on different face databases with a variety of variations verify the effectiveness and robustness of our proposed method.
Recently, numerous imaging methods have been proposed for fast MR imaging, including Single Carrier Wideband MRI proposed by Huang et al. [1] and other techniques performed in the k-space. However, these methods suffer from blurring and ringing artifacts due to insufficient k-space sampling numbers. In order to take advantages of fast imaging techniques while preserving high-resolution MR images, we propose an image filtering based algorithm, adaptive guided filtering , which is able to suppress the above artifacts while enhancing/preserving the contrast of image details.
Recognizing image data across different domains has been a challenging task. For biometrics, heterogeneous face recognition (HFR) deals with recognition problems in which training/gallery images are collected in terms of one modality (e.g., photos), while test/probe images are observed in the other (e.g., sketches). In this paper, we present a domain adaptation approach for solving HFR problems. By utilizing external face images (i.e., those collected from the subjects not of interest) from both source and target domains, we propose a novel Domainindependent Component Analysis (DiCA) algorithm for deriving a common subspace for relating and representing cross-domain image data. In order to introduce improved representation ability, we further advance the self-taught learning strategy for learning a domain-independent dictionary in our DiCA subspace, which can be applied to both gallery and probe images of interest to improve representation and recognition. Different from some prior domain-adaptation approaches, we do not require the data correspondences (i.e., data pairs) when collecting external crossdomain image data, nor the label information is needed for learning the common feature space when associating different domains. Thus, our method is practical for real-world crossdomain classification problems. In our experiments, we consider sketch-to-photo and near-infrared (NIR) to visible spectrum (VIS) face recognition problems for evaluating the performance of our proposed approach.
Due to the ambiguity in describing and discriminating between clothing images of different styles, it has been a challenging task to solve clothing image characterization problems. Based on the use of multiple types of visual features, we propose a novel multi-view nonnegative matrix factorization (NMF) algorithm for solving the above task. Our multi-view NMF not only observes image representations for describing clothing images in terms of visual appearances, an optimal combination of such features for each clothing image style would also be learned, while the separation between different image styles can be preserved. To verify the effectiveness of our method, we conduct experiments on two image datasets, and we confirm that our method produces satisfactory performance in terms of both clustering and categorization.
We present a trajectory based approach to detect salient regions in videos by dominant camera motion removal. Our approach is designed in a general way so that it can be applied to videos taken by either stationary or moving cameras without any prior information. Moreover, multiple salient regions of different temporal lengths can also be detected. To this end, we extract a set of spatially and temporally coherent trajectories of keypoints in a video. Then, velocity and acceleration entropies are proposed to represent the trajectories. In this way, long-term object motions are exploited to filter out short-term noises, and object motions of various temporal lengths can be represented in the same way. On the other hand, we are inspired by the observation that the trajectories in backgrounds, i.e., the nonsalient trajectories, are usually consistent with the dominant camera motion no matter whether the camera is stationary or not. We make use of this property to develop a unified approach to saliency generation for both stationary and moving cameras. Specifically, one-class SVM is employed to remove the consistent trajectories in motion. It follows that the salient regions could be highlighted by applying a diffusion process to the remaining trajectories. In addition, we create a set of manually annotated ground truth on the collected videos. The annotated videos are then used for performance evaluation and comparison. The promising results on various types of videos demonstrate the effectiveness and great applicability of our approach.
Native Mandarin normal-hearing (NH) listeners can easily perceive lexical tones even under conditions of great voice pitch variations across speakers by using the pitch contrast between context and target stimuli. It is however unclear whether cochlear implant (CI) users with limited access to pitch cues can make similar use of context pitch cues for tone normalization. In this study, native Mandarin NH listeners and pre-lingually deafened unilaterally implanted CI users were asked to recognize a series of Mandarin tones varying from Tone 1 (high-flat) to Tone 2 (mid-rising) with or without a preceding sentence context. Most of the CI subjects used a hearing aid (HA) in the non-implanted ear (i.e., bimodal users) and were tested both with CI alone and CI + HA. In the test without context, typical S-shaped tone recognition functions were observed for most CI subjects and the function slopes and perceptual boundaries were similar with either CI alone or CI + HA. Compared to NH subjects, CI subjects were less sensitive to the pitch changes in target tones. In the test with context, NH subjects had more (resp. fewer) Tone-2 responses in a context with high (resp. low) fundamental frequencies, known as the contrastive context effect. For CI subjects, a similar contrastive context effect was found statistically significant for tone recognition with CI + HA but not with CI alone. The results suggest that the pitch cues from CIs may not be sufficient to consistently support the pitch contrast processing for tone normalization. The additional pitch cues from aided residual acoustic hearing can however provide CI users with a similar tone normalization capability as NH listeners.
We present a novel domain adaptation approach for solving cross-domain pattern recognition problems, i.e., the data or features to be processed and recognized are collected from different domains of interest. Inspired by canonical correlation analysis (CCA), we utilize the derived correlation subspace as a joint representation for associating data across different domains, and we advance reduced kernel techniques for kernel CCA (KCCA) if nonlinear correlation subspace are desirable. Such techniques not only makes KCCA computationally more efficient, potential over-fitting problems can be alleviated as well. Instead of directly performing recognition in the derived CCA subspace (as prior CCA-based domain adaptation methods did), we advocate the exploitation of domain transfer ability in this subspace, in which each dimension has a unique capability in associating cross-domain data. In particular, we propose a novel support vector machine (SVM) with a correlation regularizer, named correlation-transfer SVM, which incorporates the domain adaptation ability into classifier design for cross-domain recognition. We show that our proposed domain adaptation and classification approach can be successfully applied to a variety of cross-domain recognition tasks such as cross-view action recognition, handwritten digit recognition with different features, and image-to-text or text-to-image classification. From our empirical results, we verify that our proposed method outperforms state-of-the-art domain adaptation approaches in terms of recognition performance.
While bag-of-features (BOF) models have been widely applied for addressing image retrieval problems, the resulting performance is typically limited due to its disregard of spatial information of local image descriptors (and the associated visual words). In this paper, we present a novel spatial pooling scheme, called extended bag-of-features (EBOF), for solving the above task. Besides improving image representation capability, the incorporation of the our EBOF model with a proposed circular-correlation based similarity measure allows us to perform translation, rotation, and scale-invariant image retrieval. We conduct experiments on two benchmark image datasets, and the performance confirms the effectiveness and robustness of our proposed approach.