:::
Layer-based video coding, together with adaptive modulation and coding, is a promising technique for providing real-time video multicast services on heterogeneous mobile devices. With the rapid growth of data communications for emerging applications, reducing the energy consumption of mobile devices is a major challenge. This paper addresses the problem of resource allocation for video multicast in fourth-generation wireless systems, with the objective of minimizing the total energy consumption for data reception. First, we consider the problem when scalable video coding is applied. We prove that the problem is NP-hard and propose a 2-approximation algorithm to solve it. Then, we investigate the problem under multiple description coding, and show that it is also NP-hard and cannot be approximated in polynomial time with a ratio better than 2, unless P=NP. To solve this case, we develop a pseudopolynomial time 2-approximation algorithm. The results of simulations conducted to compare the proposed algorithms with a brute-force optimal algorithm and a conventional approach are very encouraging.
The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted sound). It is believed that enhancing this dependency can make the converted sound closer to the natural sound. To this end, we propose an integrated maximum a posteriori and mutual information (MAPMI) criterion for parameter generation on spectral conversion. Experimental results demonstrate that the quality of converted speech by the proposed MAPMI method outperforms that by the conventional method in terms of formal listening test.
Most digital cameras capture one primary color at each pixel by a single sensor overlaid with a color filter array. To re- cover a full color image from incomplete color samples, one needs to restore the two missing color values for each pixel. This restoration process is known as color demosaicking. In this paper, we present a novel self-learning approach to this problem via support vector regression. Unlike prior learning-based demosaicking methods, our approach aims at extract- ing image-dependent information in constructing the learning model, and we do not require any additional training data. Experimental results show that our proposed method outperforms many state-of-the-art techniques in both subjective and objective image quality measures.
We aim to resolve the difficulties of action recognition arising from the large intra-class variations. These unfavorable variations make it infeasible to represent one action instance by other ones of the same action. We hence propose to extract both instance-specific and class-consistent features to facilitate action recognition. Specifically, the instance-specific features explore the self-similarities among frames of each video instance, while class-consistent features summarize withinclass similarities. We introduce a generative formulation to combine the two diverse types of features. The experimental results demonstrate the effectiveness of our approach.
Rain removal from a single image is one of the challenging image denoising problems. In this paper, we present a learning-based framework for single image rain removal, which focuses on the learning of context information from an input image, and thus the rain patterns present in it can be automatically identified and removed. We approach the single image rain removal problem as the integration of image decomposition and self-learning processes. More precisely, our method first performs context-constrained image segmentation on the input image, and we learn dictionaries for the highfrequency components in different context categories via sparse coding for reconstruction purposes. For image regions with rain streaks, dictionaries of distinct context categories will share common atoms which correspond to the rain patterns. By utilizing PCA and SVM classifiers on the learned dictionaries, our framework aims at automatically identifying the common rain patterns present in them, and thus we can remove rain streaks as particular high-frequency components from the input image. Different from prior works on rain removal from images/videos which require image priors or training image data from multiple frames, our proposed selflearning approach only requires the input image itself, which would save much pre-training effort. Experimental results demonstrate the subjective and objective visual quality improvement with our proposed method.
This paper concerns the development of a music codebook for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like representation better captures the rich and time-varying information of music. We systematically compare a number of existing codebook generation techniques and also propose a new one that incorporates labeled data in the dictionary learning process. Several aspects of the encoding system such as local feature extraction and codeword encoding are also analyzed. Our result demonstrates the superiority of sparsity-enforced dictionary learning over conventional VQ-based or exemplar-based methods. With the new supervised dictionary learning algorithm and the optimal settings inferred from the performance study we are able to achieve state-of-the-art accuracy of music genre classification using just the log-power spectrogram as the local feature descriptor. The classification accuracies for two benchmark datasets GTZAN and ISMIR2004Genre are 84.7% and 90.8%, respectively.
We propose a novel multiple kernel learning (MKL) algorithm with a group lasso regularizer, called group lasso regularized MKL (GL-MKL), for heterogeneous feature fusion and variable selection. For problems of feature fusion, assigning a group of base kernels for each feature type in an MKL framework provides a robust way in fitting data extracted from different feature domains. Adding a mixed $ell_{1,2}$ norm constraint (i.e., group lasso) as the regularizer, we can enforce the sparsity at the group/feature level and automatically learn a compact feature set for recognition purposes. More precisely, our GL-MKL determines the optimal base kernels, including the associated weights and kernel parameters, and results in improved recognition performance. Besides, our GL-MKL can also be extended to address heterogeneous variable selection problems. For such problems, we aim to select a compact set of variables (i.e., feature attributes) for comparable or improved performance. Our proposed method does not need to exhaustively search for the entire variable space like prior sequential-based variable selection methods did, and we do not require any prior knowledge on the optimal size of the variable subset either. To verify the effectiveness and robustness of our GL-MKL, we conduct experiments on video and image datasets for heterogeneous feature fusion, and perform variable selection on various UCI datasets.
Improving the endurance of PCM is a fundamental issue when the technology is considered as an alternative to main memory usage. In the design of memory-based wear leveling approaches, a major challenge is how to efficiently determine the appropriate memory pages for allocation or swapping. In this paper, we present an efficient wear-leveling design that is compatible with existing virtual memory management. Two implementations, namely, bucket-based and array-based wear leveling, with nearly zero search cost are proposed to tradeoff time and space complexity. The results of experiments conducted based on popular benchmarks to evaluate the efficacy of the proposed design are very encouraging.
We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recognition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior works did not consider the possible contamination of data during training, and thus the associated performance might be degraded. Based on the recent success of low-rank matrix recovery, we propose a novel low-rank matrix approximation algorithm with structural incoherence for robust face recognition. Our method not only decomposes raw training data into a set of representative basis with a corresponding sparse error for better modeling the face images, we further advocate the structural incoherence between basis learned from different classes. These basis are encouraged to be as independent as possible due to the regularization on structural incoherence. We show that this provides additional discriminating ability to the original low-rank models for improved performance. Experimental results on public face databases verify the effectiveness and robustness of our method, which is also shown to outperform state-of-the-art SRC based approaches.
In this paper, a MIMO detection scheme is proposed based on a combination of Monte Carlo technique and list detection. Specifically, a list of Gaussian samples are first generated to determine the search range of constellation points in which the transmitted symbol is most likely to locate. Linear equalizations are then applied to equalize the effect caused by the channel mixing, and a list detector is used to search within the determined search range. By varying the parameters in the Monte Carlo method, different symbol error rate (SER) versus complexity tradeoff can be obtained to account for different system design requirements. Simulation results also show that near-ML SER performance with considerably less computational complexity can be achieved by the proposed scheme compared to the exhaustive search.