:::
Recently, numerous imaging methods have been proposed for fast MR imaging, including Single Carrier Wideband MRI proposed by Huang et al. [1] and other techniques performed in the k-space. However, these methods suffer from blurring and ringing artifacts due to insufficient k-space sampling numbers. In order to take advantages of fast imaging techniques while preserving high-resolution MR images, we propose an image filtering based algorithm, adaptive guided filtering , which is able to suppress the above artifacts while enhancing/preserving the contrast of image details.
Recognizing image data across different domains has been a challenging task. For biometrics, heterogeneous face recognition (HFR) deals with recognition problems in which training/gallery images are collected in terms of one modality (e.g., photos), while test/probe images are observed in the other (e.g., sketches). In this paper, we present a domain adaptation approach for solving HFR problems. By utilizing external face images (i.e., those collected from the subjects not of interest) from both source and target domains, we propose a novel Domainindependent Component Analysis (DiCA) algorithm for deriving a common subspace for relating and representing cross-domain image data. In order to introduce improved representation ability, we further advance the self-taught learning strategy for learning a domain-independent dictionary in our DiCA subspace, which can be applied to both gallery and probe images of interest to improve representation and recognition. Different from some prior domain-adaptation approaches, we do not require the data correspondences (i.e., data pairs) when collecting external crossdomain image data, nor the label information is needed for learning the common feature space when associating different domains. Thus, our method is practical for real-world crossdomain classification problems. In our experiments, we consider sketch-to-photo and near-infrared (NIR) to visible spectrum (VIS) face recognition problems for evaluating the performance of our proposed approach.
Due to the ambiguity in describing and discriminating between clothing images of different styles, it has been a challenging task to solve clothing image characterization problems. Based on the use of multiple types of visual features, we propose a novel multi-view nonnegative matrix factorization (NMF) algorithm for solving the above task. Our multi-view NMF not only observes image representations for describing clothing images in terms of visual appearances, an optimal combination of such features for each clothing image style would also be learned, while the separation between different image styles can be preserved. To verify the effectiveness of our method, we conduct experiments on two image datasets, and we confirm that our method produces satisfactory performance in terms of both clustering and categorization.
For the task of robust face recognition, we particularly focus on the scenario in which training and test image data are corrupted due to occlusion or disguise. Prior standard face recognition methods like Eigenfaces or state-of-the-art approaches such as sparse representation-based classification did not consider possible contamination of data during training, and thus their recognition performance on corrupted test data would be degraded. In this paper, we propose a novel face recognition algorithm based on low-rank matrix decomposition to address the aforementioned problem. Besides the capability of decomposing raw training data into a set of representative bases for better modeling the face images, we introduce a constraint of structural incoherence into the proposed algorithm, which enforces the bases learned for different classes to be as independent as possible. As a result, additional discriminating ability is added to the derived base matrices for improved recognition performance. Experimental results on different face databases with a variety of variations verify the effectiveness and robustness of our proposed method.
We present a trajectory based approach to detect salient regions in videos by dominant camera motion removal. Our approach is designed in a general way so that it can be applied to videos taken by either stationary or moving cameras without any prior information. Moreover, multiple salient regions of different temporal lengths can also be detected. To this end, we extract a set of spatially and temporally coherent trajectories of keypoints in a video. Then, velocity and acceleration entropies are proposed to represent the trajectories. In this way, long-term object motions are exploited to filter out short-term noises, and object motions of various temporal lengths can be represented in the same way. On the other hand, we are inspired by the observation that the trajectories in backgrounds, i.e., the nonsalient trajectories, are usually consistent with the dominant camera motion no matter whether the camera is stationary or not. We make use of this property to develop a unified approach to saliency generation for both stationary and moving cameras. Specifically, one-class SVM is employed to remove the consistent trajectories in motion. It follows that the salient regions could be highlighted by applying a diffusion process to the remaining trajectories. In addition, we create a set of manually annotated ground truth on the collected videos. The annotated videos are then used for performance evaluation and comparison. The promising results on various types of videos demonstrate the effectiveness and great applicability of our approach.
Native Mandarin normal-hearing (NH) listeners can easily perceive lexical tones even under conditions of great voice pitch variations across speakers by using the pitch contrast between context and target stimuli. It is however unclear whether cochlear implant (CI) users with limited access to pitch cues can make similar use of context pitch cues for tone normalization. In this study, native Mandarin NH listeners and pre-lingually deafened unilaterally implanted CI users were asked to recognize a series of Mandarin tones varying from Tone 1 (high-flat) to Tone 2 (mid-rising) with or without a preceding sentence context. Most of the CI subjects used a hearing aid (HA) in the non-implanted ear (i.e., bimodal users) and were tested both with CI alone and CI + HA. In the test without context, typical S-shaped tone recognition functions were observed for most CI subjects and the function slopes and perceptual boundaries were similar with either CI alone or CI + HA. Compared to NH subjects, CI subjects were less sensitive to the pitch changes in target tones. In the test with context, NH subjects had more (resp. fewer) Tone-2 responses in a context with high (resp. low) fundamental frequencies, known as the contrastive context effect. For CI subjects, a similar contrastive context effect was found statistically significant for tone recognition with CI + HA but not with CI alone. The results suggest that the pitch cues from CIs may not be sufficient to consistently support the pitch contrast processing for tone normalization. The additional pitch cues from aided residual acoustic hearing can however provide CI users with a similar tone normalization capability as NH listeners.
While bag-of-features (BOF) models have been widely applied for addressing image retrieval problems, the resulting performance is typically limited due to its disregard of spatial information of local image descriptors (and the associated visual words). In this paper, we present a novel spatial pooling scheme, called extended bag-of-features (EBOF), for solving the above task. Besides improving image representation capability, the incorporation of the our EBOF model with a proposed circular-correlation based similarity measure allows us to perform translation, rotation, and scale-invariant image retrieval. We conduct experiments on two benchmark image datasets, and the performance confirms the effectiveness and robustness of our proposed approach.
We present a novel domain adaptation approach for solving cross-domain pattern recognition problems, i.e., the data or features to be processed and recognized are collected from different domains of interest. Inspired by canonical correlation analysis (CCA), we utilize the derived correlation subspace as a joint representation for associating data across different domains, and we advance reduced kernel techniques for kernel CCA (KCCA) if nonlinear correlation subspace are desirable. Such techniques not only makes KCCA computationally more efficient, potential over-fitting problems can be alleviated as well. Instead of directly performing recognition in the derived CCA subspace (as prior CCA-based domain adaptation methods did), we advocate the exploitation of domain transfer ability in this subspace, in which each dimension has a unique capability in associating cross-domain data. In particular, we propose a novel support vector machine (SVM) with a correlation regularizer, named correlation-transfer SVM, which incorporates the domain adaptation ability into classifier design for cross-domain recognition. We show that our proposed domain adaptation and classification approach can be successfully applied to a variety of cross-domain recognition tasks such as cross-view action recognition, handwritten digit recognition with different features, and image-to-text or text-to-image classification. From our empirical results, we verify that our proposed method outperforms state-of-the-art domain adaptation approaches in terms of recognition performance.
The recent advances in RGB-D cameras have allowed us to better solve increasingly complex computer vision tasks. However, modern RGB-D cameras are still restricted by the short effective distances. The limitation may make RGB-D cameras not online accessible in practice, and degrade their applicability. We propose an alternative scenario to address this problem, and illustrate it with the application to action recognition. We use Kinect to offline collect an auxiliary, multi-modal database, in which not only the RGB videos but also the depth maps and skeleton structures of actions of interest are available. Our approach aims to enhance action recognition in RGB videos by leveraging the extra database. Specifically, it optimizes a feature transformation, by which the actions to be recognized can be concisely reconstructed by entries in the auxiliary database. In this way, the inter-database variations are adapted. More importantly, each action can be augmented with additional depth and skeleton images retrieved from the auxiliary database. The proposed approach has been evaluated on three benchmarks of action recognition. The promising results manifest that the augmented depth and skeleton features can lead to remarkable boost in recognition accuracy.
In this paper, we address the problem of the high annotation cost of acquiring training data for semantic segmentation. Most modern approaches to semantic segmentation are based upon graphical models, such as the conditional random fields, and rely on sufficient training data in form of object contours. To reduce the manual effort on pixel-wise annotating contours, we consider the setting in which the training data set for semantic segmentation is a mixture of a few object contours and an abundant set of bounding boxes of objects. Our idea is to borrow the knowledge derived from the object contours to infer the unknown object contours enclosed by the bounding boxes. The inferred contours can then serve as training data for semantic segmentation. To this end, we generate multiple contour hypotheses for each bounding box with the assumption that at least one hypothesis is close to the ground truth. This paper proposes an approach, called augmented multiple instance regression (AMIR), that formulates the task of hypothesis selection as the problem of multiple instance regression (MIR), and augments information derived from the object contours to guide and regularize the training process of MIR. In this way, a bounding box is treated as a bag with its contour hypotheses as instances, and the positive instances refer to the hypotheses close to the ground truth. The proposed approach has been evaluated on the Pascal VOC segmentation task. The promising results demonstrate that AMIR can precisely infer the object contours in the bounding boxes, and hence provide effective alternatives to manually labeled contours for semantic segmentation.