:::
Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.
As mobile devices have become more ubiquitous, mobile users increasingly expect to utilize proximity-based connectivity, e.g., WiFi and Bluetooth, to opportunistically share multimedia content based on their personal preferences. However, many previous studies investigate content dissemination protocols that distribute a single object to as many users in an opportunistic mobile social network as possible without considering user preference.  In this paper, we propose PrefCast, a preference-aware content dissemination protocol that targets on maximally satisfying user preference for content objects.  Due to non-persistent connectivity between users in a mobile social network, when a user meets neighboring users for a limited contact duration, it needs to efficiently disseminate a suitable set of objects that can bring possible future contacts a high utility (the quantitative metric of preference satisfaction).  We formulate such a problem as a maximum-utility forwarding model, and propose an algorithm that enables each user to predict how much utility it can contribute to future contacts and solve its optimal forwarding schedule in a distributed manner.  Our trace-based evaluation shows that PrefCast can produce a 18.5% and 25.2% higher average utility than the protocols that only consider contact frequency or preference of local contacts, respectively.
While solid-state drives are excellent alternatives to hard disks in mobile devices, a number of performance and reliability issues need to be addressed. In this work, we design an efficient flash management scheme for the performance improvement of low-cost MLC flash memory devices. Specifically, we design an efficient flash management scheme for multi-chipped flash memory devices with cache support, and develop a two-level address translation mechanism with an adaptive caching policy. We evaluated the approach on real workloads. The results demonstrate that it can improve the performance of multi-chipped solid-state drives through logical-to-physical mappings and concurrent accesses to flash chips.
The use of bag-of-features (BOF) models has been a popular technique for image classication and retrieval. In order to better represent and discriminate images from dierent classes, we advance BOF and explore the self-similarities of visual words for improved performance. The proposed self-similarity hypercubes (SSH) model, which observes the concurrent occurrences of visual words in an image, is able to describe the structural information of the BOF in an im- age. Our experiments conrm that our SSH provides additional and complementary information to BOF and thus results in improved classication performance. Unlike most prior methods requiring extraction or integration of multiple types of features for similar improvements, our SSH works in the same domain as the BOF does. Moreover, we do not limit the use of our SSH to any particular type of image descriptors, and its generalization is also veried.
With the emergence of many-core systems, managing blocking costs effectively will soon become a critical issue in the design of real-time systems. In contrast to previous works on multi-core real-time task scheduling algorithms and synchronization protocols, this paper proposes a dedicated-core framework to separate the executions of application tasks and (system) services over cores such that blocking among tasks can be better explored and managed. The rationale behind the framework is that we can exploit the characteristics of many-core systems to resolve the challenges raised by the systems themselves. We define three core minimization problems with respect to the constraints on core configurations, and present corresponding task allocation algorithms with optimal, approximate, and heuristic solutions. The results of simulations conducted to evaluate the proposed framework provide further insights into task scheduling in many-core real-time systems.
The online repository of music tags provides a rich source of semantic descriptions useful for training emotion-based music classifier. However, the imbalance of the online tags affects the performance of emotion classification. In this paper, we present a novel data-sampling method that eliminates the imbalance but still takes the prior probability of each emotion class into account. In addition, a two-layer emotion classification structure is proposed to harness the genre information available in the online repository of music tags. We show that genre-based grouping as a precursor greatly improves the performance of emotion classification. On the average, the incorporation of online genre tags improves the performance of emotion classification by a factor of 55% over the conventional single-layer system. The performance of our algorithm for classifying 183 emotion classes reaches 0.36 in example-based f-score.
Motivated by image reconstruction, sparse representation based classification (SRC) has been shown to be an effective method for applications like face recognition. In this paper, we propose a locality-sensitive dictionary learning algorithm for SRC, and the designed dictionary is able to preserve local data structure, resulting in improved image classification. During the dictionary update and sparse coding stages in the proposed algorithm, we provide closed-form solutions and enforce the data locality constraint throughout the learning process. In contrast to previous dictionary learning approaches utilizing sparse representation techniques, which did not (or only partially) take data locality into consideration, our algorithm is able to produce a more representative dictionary and thus achieves better performance. We conduct experiments on databases designed for face and handwritten digit recognition. For such reconstruction-based classification problems, we will confirm that our proposed method results in better or comparable performance as state-of-the-art SRC methods do, while less training time for dictionary learning can be obtained.