:::
In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearance path captures macroscopic vehicle features while the orientation conditioned part appearance path learns to capture localized discriminative features by focusing attention on the most informative key-points. Through extensive experimentation, we show that the proposed AAVER method is able to accurately re-identify vehicles in unconstrained scenarios, yielding state of the art results on the challenging dataset VeRi-776. As a byproduct, the proposed system is also able to accurately predict vehicle key-points and shows an improvement of more than 7% over state of the art.
Unconstrained video-based face recognition is a challenging problem due to significant within-video variations caused by pose, occlusion and blur. To tackle this problem, an effective idea is to propagate the identity from high-quality faces to low-quality ones through contextual connections, which are constructed based on context such as body appearance. However, previous methods have often propagated erroneous information due to lack of uncertainty modeling of the noisy contextual connections. In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph. UGG explicitly models the uncertainty of the contextual connections by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results in the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark dataset.
In-memory techniques keep data into faster and more expensive storage media for improving performance of big data processing. However, existing mechanisms do not consider how to expedite the data processing applications that access the input datasets only once. Another problem is how to reclaim memory without affecting other running applications. In this paper, we provide scheduling-aware data prefetching and eviction mechanisms based on Spark, Alluxio, and Hadoop. The mechanisms prefetch data and release memory resources based on the scheduling information. A mathematical method is proposed for maximizing the reduction of data access time. To make the mechanisms applicable in large-scale environments, we propose a heuristic algorithm to reduce the computational time. Furthermore, an enhanced version of the heuristic algorithm is also proposed to increase the amount of prefetched data. Finally, we perform real-testbed and simulation experiments to show the effectiveness of the proposed mechanisms.
Device-free indoor localization is a key enabling technology for many Internet of Things (IoT) applications. Deep neural network (DNN)-based location estimators achieve high-precision localization performance by automatically learning discriminative features from noisy wireless signals without much human intervention. However, the inner workings of DNN are not transparent and not adequately understood especially in wireless localization applications. In this paper, we conduct visual analyses of DNN-based location estimators trained with Wi-Fi channel state information (CSI) fingerprints in a real-world experiment. We address such questions as 1) how well has the DNN learned and been trained, and 2) what critical features has the DNN learned to distinguish different classes, via visualization techniques. The results provide plausible explanations and allow for a better understanding of the mechanism of DNN-based wireless indoor localization.
Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintaining the quality of the separated sounds. Therefore, this paper investigates using stacked dilated convolutions as the backbone for music source separation. However, while stacked dilated convolutions can reach wider context than standard convolutions, their effective receptive fields are still fixed and may not be wide enough for complex music audio signals. To reach information at remote locations, we propose to combine dilated convolution with a modified version of gated recurrent units (GRU) called the `Dilated GRU' to form a block. A Dilated GRU unit receives information from k steps before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute partially in parallel. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art models for separating vocals and accompaniments.
Music creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been much investigated. This paper presents, to the best of our knowledge, the first deep learning models for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a piece of polyphonic musical audio as input and predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related to the musical content (pitch) of different parts of the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment. By disentangling pitch and timbre, our models have an idea of how each piece was composed and arranged. Moreover, the models can realize "composition style transfer" by rearranging a musical piece without much affecting its pitch content. We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer.
Concurrency control allows multiple tasks that share data objects to be concurrently executed in a serializable order, thus significantly improving computation progress. However, to accumulate forward progress on energy-harvesting intermittent systems while achieving data consistency across power cycles, existing approaches based on the checkpointing paradigm typically require system suspension at runtime. The runtime overheads incurred by suspension will be more manifest when more tasks are suspended and resumed during checkpointing, offsetting the computation progress improved by concurrent task execution. This paper presents a multiversion concurrency control design, which enables concurrent task execution without system suspension during checkpointing, while maintaining the serializability of task execution and ensuring data consistency after system recovery. We integrated our design into FreeRTOS running on a Texas Instruments device. Experimental results show that, at the very best, our design can double computation progress by reducing the runtime overheads incurred by system checkpointing, especially when tasks are executed with high concurrency.
Electrophoretic displays are ideal for self-powered systems, but currently require an uninterrupted power supply to carry out the full display update cycle. Although sensible for battery-powered devices, when directly applied to intermittently-powered systems, guaranteeing display update atomicity usually results in repeated execution until completion or can incur high hardware/software overheads, heavy programmer intervention and large energy buffering requirements to provide sufficient display update energy. This paper introduces the concept, design and implementation of accumulative display updating, which relaxes the atomicity constraints of display updating, such that the display update process can be accumulatively completed across power cycles, without the need for sufficient energy for the entire display update. To allow for process logical continuity, we track the update progress during execution and facilitate a safe display shutdown procedure to overcome physical and operability issues related to abrupt power failure. Additionally, a context-aware updating policy is proposed to handle data freshness issues, where the delay in addressing new update requests can cause the display contents to be in conflict with new data available. Experimental results on a Texas Instruments device with an integrated electrophoretic display show that, compared to atomic display updating, our design can significantly increase accurate forward progress, decrease the average response time of display updating and reduce time and energy wastage when displaying fresh data.
Device-free Wi-Fi indoor localization has received significant attention as a key enabling technology for many Internet of Things (IoT) applications. Machine learning-based location estimators, such as the deep neural network (DNN), carry proven potential in achieving high-precision localization performance by automatically learning discriminative features from the noisy wireless signal measurements. However, the inner workings of DNNs are not transparent and not adequately understood especially in the indoor localization application. In this paper, we provide quantitative and visual explanations for the DNN learning process as well as the critical features that DNN has learned during the process. Toward this end, we propose to use several visualization techniques, including: 1) dimensionality reduction visualization, to project the high-dimensional feature space to the 2D space to facilitate visualization and interpretation, and 2) visual analytics and information visualization, to quantify relative contributions of each feature with the proposed feature manipulation procedures. The results provide insightful views and plausible explanations of the DNN in device-free Wi-Fi indoor localization using channel state information (CSI) fingerprints.
Self-powered intermittent systems waste considerable I/O energy because volatile I/O modules repeatedly issue identical operations under power failure conditions, and also due to the use of the inefficient I/O stack originally developed for battery-powered systems. This paper presents the concept, design, and implementation of autonomous I/O, which can accumulatively and transparently complete I/O operations regardless of power stability. We define its two essential functionalities, separate the general I/O stack to make accumulatively-completed I/O operations transparent to application tasks, and propose an access protocol that allows for energy efficiency and compatibility with the general I/O stack. To evaluate the efficacy, we implement our design and conduct extensive experiments on a Texas Instruments device with commodity sensor and Wi-Fi modules. Experimental results show that autonomous I/O can achieve 1.8 times the throughout achieved with nonvolatile I/O when the power is relatively steady, while reducing the completion time of individual I/O operations by at least 34% with relatively unstable power.