:::
We present a comprehensive performance study of a new time-domain approach for estimating the components of an observed monaural audio mixture. Unlike existing time-frequency approaches that use the product of a set of spectral templates and their corresponding activation patterns to approximate the spectrogram of the mixture, the proposed approach uses the sum of a set of convolutions of estimated activations with prelearned dictionary filters to approximate the audio mixture directly in the time domain. The approximation problem can be solved by an efficient convolutional sparse coding algorithm. The effectiveness of this approach for source separation of musical audio has been demonstrated in our prior work, but under rather restricted and controlled conditions, requiring the musical score of the mixture being informed a priori and little mismatch between the dictionary filters and the source signals. In this paper, we report an evaluation that considers wider, and more practical, experimental settings. This includes the use of an audio-based multi-pitch estimation algorithm to replace the musical score, and an external dataset of audio single notes to construct the dictionary filters. Our result shows that the proposed approach remains effective with a larger dictionary, and compares favorably with the state-of-the-art non-negative matrix factorization approach. However, in the absence of the score and in the case of a small dictionary, our approach may not be better.
Heterogeneous domain adaptation (HDA) addresses the task of associating data not only across dissimilar domains but also described by di erent types of features. Inspired by the recent advances of neural networks and deep learning, we propose Transfer Neural Trees (TNT) which jointly solves cross-domain feature mapping, adaptation, and classi cation in a NN-based architecture. As the prediction layer in TNT, we further propose Transfer Neural Decision Forest (Transfer-NDF), which effectively adapts the neurons in TNT for adaptation by stochastic pruning. Moreover, to address semi-supervised HDA, a unique embedding loss term for preserving prediction and structural consistency between target- domain data is introduced into TNT. Experiments on classi cation tasks across features, datasets, and modalities successfully verify the e ectiveness of our TNT.
N/A
N/A
N/A
In music auto-tagging, people develop models to automati- cally label a music clip with attributes such as instruments, styles or acoustic properties. Many of these tags are actu- ally descriptors of local events in a music clip, rather than a holistic description of the whole clip. Localizing such tags in time can potentially innovate the way people retrieve and interact with music, but little work has been done to date due to the scarcity of labeled data with granularity speci c enough to the frame level. Most labeled data for training a learning-based model for music auto-tagging are in the clip level, providing no cues when and how long these attributes appear in a music clip. To bridge this gap, we propose in this paper a convolutional neural network (CNN) architec- ture that is able to make accurate frame-level predictions of tags in unseen music clips by using only clip-level anno- tations in the training phase. Our approach is motivated by recent advances in computer vision for localizing visual objects, but we propose new designs of the CNN architec- ture to account for the temporal information of music and the variable duration of such local tags in time. We re- port extensive experiments to gain insights into the prob- lem of event localization in music, and validate through ex- periments the e ectiveness of the proposed approach. In addition to quantitative evaluations, we also present quali- tative analyses showing the model can indeed learn certain characteristics of music tags.
Improving PCMendurance is a fundamental issue when it is considered as an alternative to replace DRAM as main memory. Memory-based wear leveling (WL) is an effective way to improve PCM endurance, but its major challenge is how to efficiently determine the appropriate memory pages for allocation or swapping. In this article, we present a constant-cost WL design that is compatible with existing memory management. Two implementations, namely bucket-based and array-based WL, with constant-time (or nearly zero) search cost are proposed to be integrated into the OS layer and the hardware layer, respectively, as well as to trade between time and space complexity. The results of experiments conducted based on an implementation in Android, as well as simulations with popular benchmarks, to evaluate the effectiveness of the proposed design are very encouraging.
N/A
N/A
In simultaneous wireless information and power transfer (SWIPT), practical receiver architectures consisting of an information receiver and an energy harvester have been proposed in place of an ideal receiver capable of performing two tasks simultaneously using the same circuits. In this paper, we present a novel receiver architecture design incorporating an interplay between the information receiver and the energy harvester to enhance the performance of the practical SWIPT receiver. In particular, the energy level of the received signal monitored at the energy harvester is fed back to the information receiver to assist information decoding at the information receiver. The symbol-error-rate (SER) and diversity analyses show that the proposed receiver architecture could yield a higher diversity order for unconventional constellations where any two distinct symbols have distinct energy levels, and the same diversity order for conventional modulations. Simulation of PAM and QAM modulations verifies the analysis, shows the improved SER performance of the proposed receiver architecture, and illustrates the energy-dimension-augmented decision regions. Some insights into designing enhanced practical SWIPT receivers are provided as a result.