Emotion is one of the main reasons why people engage and interact with music [1] . Songs can express our inner feelings, produce goosebumps, bring us to tears, share an emotional state with a composer or performer, or trigger specific memories. Interest in a deeper understanding of the relationship between music and emotion has motivated researchers from various areas of knowledge for decades [2] , including computational researchers. Imagine an algorithm capable of predicting the emotions that a listener perceives in a musical piece, or one that dynamically generates music that adapts to the mood of a conversation in a film—a particularly fascinating and provocative idea. These algorithms typify music emotion recognition (MER), a computational task that attempts to automatically recognize either the emotional content in music or the emotions induced by music to the listener [3] . To do so, emotionally relevant features are extracted from music. The features are processed, evaluated, and then associated with certain emotions. MER is one of the most challenging high-level music description problems in music information retrieval (MIR), an interdisciplinary research field that focuses on the development of computational systems to help humans better understand music collections. MIR integrates concepts and methodologies from several disciplines, including music theory, music psychology, neuroscience, signal processing, and machine learning.
The increasing paradigm shift towards intermittent computing has made it possible to intermittently execute deep neural network (DNN) inference on edge devices powered by ambient energy. Recently, neural architecture search (NAS) techniques have achieved great success in automatically finding DNNs with high accuracy and low inference latency on the deployed hardware. We make a key observation, where NAS attempts to improve inference latency by primarily maximizing data reuse, but the derived solutions when deployed on intermittently-powered systems may be inefficient, such that the inference may not satisfy an end-to-end latency requirement and, more seriously, they may be unsafe given an insufficient energy budget. This work proposes iNAS, which introduces intermittent execution behavior into NAS to find accurate network architectures with corresponding execution designs, which can safely and efficiently execute under intermittent power. An intermittent-aware execution design explorer is presented, which finds the right balance between data reuse and the costs related to intermittent inference, and incorporates a preservation design search space into NAS, while ensuring the power-cycle energy budget is not exceeded. To assess an intermittent execution design, an intermittent-aware abstract performance model is presented, which formulates the key costs related to progress preservation and recovery during intermittent inference. We implement iNAS on top of an existing NAS framework and evaluate their respective solutions found for various datasets, energy budgets and latency requirements, on a Texas Instruments device. Compared to those NAS solutions that can safely complete the inference, the iNAS solutions reduce the intermittent inference latency by 60% on average while achieving comparable accuracy, with an average 7% increase in search overhead.
Intermittent systems enable batteryless devices to operate through energy harvesting by leveraging the complementary characteristics of volatile (VM) and non-volatile memory (NVM). Unfortunately, alternate and frequent accesses to heterogeneous memories for accumulative execution across power cycles can significantly hinder computation progress. The progress impediment is mainly due to more CPU time being wasted for slow NVM accesses than for fast VM accesses. This paper explores how to leverage heterogeneous cores to mitigate the progress impediment caused by heterogeneous memories. In particular, a delegable and adaptive synchronization protocol is proposed to allow memory accesses to be delegated between cores and to dynamically adapt to diverse memory access latency. Moreover, our design guarantees task serializability across multiple cores and maintains data consistency despite frequent power failures. We integrated our design into FreeRTOS running on a Cypress device featuring heterogeneous dual cores and hybrid memories. Experimental results show that, compared to recent approaches that assume single-core intermittent systems, our design can improve computation progress at least 1.8x and even up to 33.9x by leveraging core heterogeneity.
Mobile Edge Computing (MEC) is a promising paradigm to ease the computation burden of Internet-of-Things (IoT) devices by leveraging computing capabilities at the network edge. With the yearning needs for resource provision from IoT devices, the queueing delay at the edge nodes not only poses a colossal impediment to achieving satisfactory quality of experience (QoE) for the IoT devices but also to the benefits of the edge nodes owing to escalating energy expenditure. Moreover, since the service providers may differ, computationally competent entities’ computing services should entail economic compensation for the incurred energy expenditure and the capital investment. Therefore, the workload allocation mechanism, where we consider flat-rate and dynamic pricing schemes in the multi-layer edge computing structure, is much-needed. We use Stackelberg game to capture the inherent hierarchy and interdependence between the second-layer edge node (SLEN) and first-layer edge nodes (FLENs). A truthful admission control mechanism grounded on the optimal workload allocation is designed for FLENs without violating end-to-end (E2E) latency requirements. We prove that a Stackelberg equilibrium with the E2E latency guarantee and truthfulness exists and can be reached through proposed algorithm. Simulation results confirm the effectiveness of our scheme and illustrate several insights.
This paper introduces a new key space for CSIDH and a new algorithm for constant-time evaluation of the CSIDH group action. The key space is not useful with previous algorithms, and the algorithm is not useful with previous key spaces, but combining the new key space with the new algorithm produces speed records for constant-time CSIDH. For example, for CSIDH-512 with a 256-bit key space, the best previous constant-time results used 789000 multiplications and more than 200 million Skylake cycles; this paper uses 438006 multiplications and 125.53 million cycles.
Monte Carlo tree search (MCTS) has achieved state-of-the-artresults in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.
To exploit rich information from unlabeled data, in this work, wepropose a novel self-supervised framework for visual trackingwhich can easily adapt the state-of-the-art supervised Siamese-based trackers into unsupervised ones by utilizing the fact thatan image and any cropped region of it can form a natural pairfor self-training. Besides common geometric transformation-baseddata augmentation and hard negative mining, we also propose ad-versarial masking which helps the tracker to learn other contextinformation by adaptively blacking out salient regions of the tar-get. The proposed approach can be trained offline using imagesonly without any requirement of manual annotations and tempo-ral information from multiple consecutive frames. Thus, it can beused with any kind of unlabeled data, including images and videoframes. For evaluation, we take SiamFC as the base tracker andname the proposed self-supervised method as푆2SiamFC. Extensiveexperiments and ablation studies on the challenging VOT2016 andVOT2018 datasets are provided to demonstrate the effectivenessof the proposed method which not only achieves comparable per-formance to its supervised counterpart and other unsupervisedmethods requiring multiple frames.
I n recent years, waveform-mapping-based speech enhancement (SE) methods have garnered significant attention. These methods generally use a deep learning model to directly process and reconstruct speech waveforms. Because both the input and output are in waveform format, the waveform-mapping-based SE methods can overcome the distortion caused by imperfect phase estimation, which may be encountered in spectral-mapping-based SE systems. So far, most waveform-mapping-based SE methods have focused on single-channel tasks. In this paper, we propose a novel fully convolutional network (FCN) with Sinc and dilated convolutional layers (termed SDFCN) for multichannel SE that operates in the time domain. We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN). The proposed methods are evaluated on three multichannel SE tasks, namely the dual-channel inner-ear microphones SE task, the distributed microphones SE task, and the CHiME-3 dataset. The experimental results confirm the outstanding denoising capability of the proposed SE systems on both tasks and the benefits of using the residual architecture on the overall SE performance.
In this paper, we propose a convolutional neural network (CNN) model for device-free fingerprinting indoor localization based on Wi-Fi channel state information (CSI). Besides, we develop an interpretation framework to understand the representations learned by the model. By quantifying and visualizing CNN in comparison with the fully-connected feedforward deep neural network (DNN) (or multilayer perceptron), we observe that each model can automatically identify location-specific patterns, which are however different across models and are linked to the respective performance of each model. Furthermore, we quantify how features, relevant or otherwise, as deemed by the adopted quantifying metrics (i.e., relevance scores, calculated by relevance propagation techniques), determine or affect the performance results. Interpretation of learning models for wireless applications is challenging due to the lack of human sensory intuition and reference. The results presented in this paper provide visually perceivable evidence and plausible explanations for the performance advantages of CNN in this important application.