:::
This paper concerns the development of a music codebook for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like representation better captures the rich and time-varying information of music. We systematically compare a number of existing codebook generation techniques and also propose a new one that incorporates labeled data in the dictionary learning process. Several aspects of the encoding system such as local feature extraction and codeword encoding are also analyzed. Our result demonstrates the superiority of sparsity-enforced dictionary learning over conventional VQ-based or exemplar-based methods. With the new supervised dictionary learning algorithm and the optimal settings inferred from the performance study we are able to achieve state-of-the-art accuracy of music genre classification using just the log-power spectrogram as the local feature descriptor. The classification accuracies for two benchmark datasets GTZAN and ISMIR2004Genre are 84.7% and 90.8%, respectively.
We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recognition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior works did not consider the possible contamination of data during training, and thus the associated performance might be degraded. Based on the recent success of low-rank matrix recovery, we propose a novel low-rank matrix approximation algorithm with structural incoherence for robust face recognition. Our method not only decomposes raw training data into a set of representative basis with a corresponding sparse error for better modeling the face images, we further advocate the structural incoherence between basis learned from different classes. These basis are encouraged to be as independent as possible due to the regularization on structural incoherence. We show that this provides additional discriminating ability to the original low-rank models for improved performance. Experimental results on public face databases verify the effectiveness and robustness of our method, which is also shown to outperform state-of-the-art SRC based approaches.
We propose a novel multiple kernel learning (MKL) algorithm with a group lasso regularizer, called group lasso regularized MKL (GL-MKL), for heterogeneous feature fusion and variable selection. For problems of feature fusion, assigning a group of base kernels for each feature type in an MKL framework provides a robust way in fitting data extracted from different feature domains. Adding a mixed $ell_{1,2}$ norm constraint (i.e., group lasso) as the regularizer, we can enforce the sparsity at the group/feature level and automatically learn a compact feature set for recognition purposes. More precisely, our GL-MKL determines the optimal base kernels, including the associated weights and kernel parameters, and results in improved recognition performance. Besides, our GL-MKL can also be extended to address heterogeneous variable selection problems. For such problems, we aim to select a compact set of variables (i.e., feature attributes) for comparable or improved performance. Our proposed method does not need to exhaustively search for the entire variable space like prior sequential-based variable selection methods did, and we do not require any prior knowledge on the optimal size of the variable subset either. To verify the effectiveness and robustness of our GL-MKL, we conduct experiments on video and image datasets for heterogeneous feature fusion, and perform variable selection on various UCI datasets.
In this paper, a MIMO detection scheme is proposed based on a combination of Monte Carlo technique and list detection. Specifically, a list of Gaussian samples are first generated to determine the search range of constellation points in which the transmitted symbol is most likely to locate. Linear equalizations are then applied to equalize the effect caused by the channel mixing, and a list detector is used to search within the determined search range. By varying the parameters in the Monte Carlo method, different symbol error rate (SER) versus complexity tradeoff can be obtained to account for different system design requirements. Simulation results also show that near-ML SER performance with considerably less computational complexity can be achieved by the proposed scheme compared to the exhaustive search.
ZigBee, a unique communication standard designed for low-rate wireless personal area networks, has extremely low complexity, cost, and power consumption for wireless connectivity in inexpensive, portable, and mobile devices. Among the well-known ZigBee topologies, ZigBee cluster-tree is especially suitable for low-power and low-cost wireless sensor networks because it supports power saving operations and light weight routing. In a constructed wireless sensor network, the information about some area of interest may require further investigation such that more traffic will be generated. However, the restricted routing of a ZigBee cluster-tree network may not be able to provide sufficient bandwidth for the increased traffic load, so the additional information may not be delivered successfully. In this paper, we present an adoptive-parent-based framework for a ZigBee cluster-tree network to increase bandwidth utilization without generating any extra message exchange. To optimize the throughput in the framework, we model the process as a vertex-constraint maximum flow problem, and develop a distributed algorithm that is fully compatible with the ZigBee standard. The optimality and convergence property of the algorithm are proved theoretically. Finally, the results of simulation experiments demonstrate the significant performance improvement achieved by the proposed framework and algorithm over existing approaches.
The (t, n) visual cryptography (VC) is a secret sharing scheme where a secret image is encoded into n transparencies, and the stacking of any t out of n transparencies reveals the secret image. The stacking of t - 1 or fewer transparencies is unable to extract any information about the secret. We discuss the additions and deletions of users in a dynamic user group. To reduce the overhead of generating and distributing transparencies in user changes, this paper proposes a (t, n) VC scheme with unlimited n based on the probabilistic model. The proposed scheme allows n to change dynamically in order to include new transparencies without regenerating and redistributing the original transparencies. Specifically, an extended VC scheme based on basis matrices and a probabilistic model is proposed. An equation is derived from the fundamental definitions of the (t, n) VC scheme, and then the (t, ∞) VC scheme achieving maximal contrast can be designed by using the derived equation. The maximal contrasts with t = 2 to 6 are explicitly solved in this paper.
The tree representation of the multiple-input multiple-output (MIMO) detection problem is illuminating for the development, interpretation, and classification of various detection methods. Best-first detection based on Dijkstra's algorithm pursues tree search according to a sorted list of tree nodes. In the first part of the paper, a new probabilistic sorting scheme is developed and incorporated in a modified Dijkstra's algorithm for MIMO detection. The proposed sorting exploits the statistics of the problem and yields effective tree exploration and truncation in the proposed algorithm. The second part of the paper generalizes the results in the first part and removes some limitations. A generalized Dijkstra's algorithm is developed as a unified tree-search detection framework. The proposed framework incorporates a parameter triplet that allow the configuration of the memory usage, detection complexity, and sorting dynamic associated with the tree-search algorithm. By tuning different parameters, desired performance-complexity tradeoffs are attained and a fixed-complexity version can be produced. Simulation results and analytical discussions demonstrate that the proposed generalized Dijkstra's algorithm shows abilities to achieve highly favorable performance-complexity tradeoffs.
Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.
As mobile devices have become more ubiquitous, mobile users increasingly expect to utilize proximity-based connectivity, e.g., WiFi and Bluetooth, to opportunistically share multimedia content based on their personal preferences. However, many previous studies investigate content dissemination protocols that distribute a single object to as many users in an opportunistic mobile social network as possible without considering user preference.  In this paper, we propose PrefCast, a preference-aware content dissemination protocol that targets on maximally satisfying user preference for content objects.  Due to non-persistent connectivity between users in a mobile social network, when a user meets neighboring users for a limited contact duration, it needs to efficiently disseminate a suitable set of objects that can bring possible future contacts a high utility (the quantitative metric of preference satisfaction).  We formulate such a problem as a maximum-utility forwarding model, and propose an algorithm that enables each user to predict how much utility it can contribute to future contacts and solve its optimal forwarding schedule in a distributed manner.  Our trace-based evaluation shows that PrefCast can produce a 18.5% and 25.2% higher average utility than the protocols that only consider contact frequency or preference of local contacts, respectively.
While solid-state drives are excellent alternatives to hard disks in mobile devices, a number of performance and reliability issues need to be addressed. In this work, we design an efficient flash management scheme for the performance improvement of low-cost MLC flash memory devices. Specifically, we design an efficient flash management scheme for multi-chipped flash memory devices with cache support, and develop a two-level address translation mechanism with an adaptive caching policy. We evaluated the approach on real workloads. The results demonstrate that it can improve the performance of multi-chipped solid-state drives through logical-to-physical mappings and concurrent accesses to flash chips.