:::
The increasing paradigm shift towards intermittent computing has made it possible to intermittently execute deep neural network (DNN) inference on edge devices powered by ambient energy. Recently, neural architecture search (NAS) techniques have achieved great success in automatically finding DNNs with high accuracy and low inference latency on the deployed hardware. We make a key observation, where NAS attempts to improve inference latency by primarily maximizing data reuse, but the derived solutions when deployed on intermittently-powered systems may be inefficient, such that the inference may not satisfy an end-to-end latency requirement and, more seriously, they may be unsafe given an insufficient energy budget. This work proposes iNAS, which introduces intermittent execution behavior into NAS to find accurate network architectures with corresponding execution designs, which can safely and efficiently execute under intermittent power. An intermittent-aware execution design explorer is presented, which finds the right balance between data reuse and the costs related to intermittent inference, and incorporates a preservation design search space into NAS, while ensuring the power-cycle energy budget is not exceeded. To assess an intermittent execution design, an intermittent-aware abstract performance model is presented, which formulates the key costs related to progress preservation and recovery during intermittent inference. We implement iNAS on top of an existing NAS framework and evaluate their respective solutions found for various datasets, energy budgets and latency requirements, on a Texas Instruments device. Compared to those NAS solutions that can safely complete the inference, the iNAS solutions reduce the intermittent inference latency by 60% on average while achieving comparable accuracy, with an average 7% increase in search overhead.
In this paper, we propose a convolutional neural network (CNN) model for device-free fingerprinting indoor localization based on Wi-Fi channel state information (CSI). Besides, we develop an interpretation framework to understand the representations learned by the model. By quantifying and visualizing CNN in comparison with the fully-connected feedforward deep neural network (DNN) (or multilayer perceptron), we observe that each model can automatically identify location-specific patterns, which are however different across models and are linked to the respective performance of each model. Furthermore, we quantify how features, relevant or otherwise, as deemed by the adopted quantifying metrics (i.e., relevance scores, calculated by relevance propagation techniques), determine or affect the performance results. Interpretation of learning models for wireless applications is challenging due to the lack of human sensory intuition and reference. The results presented in this paper provide visually perceivable evidence and plausible explanations for the performance advantages of CNN in this important application.
Device-free indoor localization is a key enabling technology for many Internet of Things (IoT) applications. Deep neural network (DNN)-based location estimators achieve high-precision localization performance by automatically learning discriminative features from noisy wireless signals without much human intervention. However, the inner workings of DNN are not transparent and not adequately understood especially in wireless localization applications. In this paper, we conduct visual analyses of DNN-based location estimators trained with Wi-Fi channel state information (CSI) fingerprints in a real-world experiment. We address such questions as 1) how well has the DNN learned and been trained, and 2) what critical features has the DNN learned to distinguish different classes, via visualization techniques. The results provide plausible explanations and allow for a better understanding of the mechanism of DNN-based wireless indoor localization.
Concurrency control allows multiple tasks that share data objects to be concurrently executed in a serializable order, thus significantly improving computation progress. However, to accumulate forward progress on energy-harvesting intermittent systems while achieving data consistency across power cycles, existing approaches based on the checkpointing paradigm typically require system suspension at runtime. The runtime overheads incurred by suspension will be more manifest when more tasks are suspended and resumed during checkpointing, offsetting the computation progress improved by concurrent task execution. This paper presents a multiversion concurrency control design, which enables concurrent task execution without system suspension during checkpointing, while maintaining the serializability of task execution and ensuring data consistency after system recovery. We integrated our design into FreeRTOS running on a Texas Instruments device. Experimental results show that, at the very best, our design can double computation progress by reducing the runtime overheads incurred by system checkpointing, especially when tasks are executed with high concurrency.
Electrophoretic displays are ideal for self-powered systems, but currently require an uninterrupted power supply to carry out the full display update cycle. Although sensible for battery-powered devices, when directly applied to intermittently-powered systems, guaranteeing display update atomicity usually results in repeated execution until completion or can incur high hardware/software overheads, heavy programmer intervention and large energy buffering requirements to provide sufficient display update energy. This paper introduces the concept, design and implementation of accumulative display updating, which relaxes the atomicity constraints of display updating, such that the display update process can be accumulatively completed across power cycles, without the need for sufficient energy for the entire display update. To allow for process logical continuity, we track the update progress during execution and facilitate a safe display shutdown procedure to overcome physical and operability issues related to abrupt power failure. Additionally, a context-aware updating policy is proposed to handle data freshness issues, where the delay in addressing new update requests can cause the display contents to be in conflict with new data available. Experimental results on a Texas Instruments device with an integrated electrophoretic display show that, compared to atomic display updating, our design can significantly increase accurate forward progress, decrease the average response time of display updating and reduce time and energy wastage when displaying fresh data.
Device-free Wi-Fi indoor localization has received significant attention as a key enabling technology for many Internet of Things (IoT) applications. Machine learning-based location estimators, such as the deep neural network (DNN), carry proven potential in achieving high-precision localization performance by automatically learning discriminative features from the noisy wireless signal measurements. However, the inner workings of DNNs are not transparent and not adequately understood especially in the indoor localization application. In this paper, we provide quantitative and visual explanations for the DNN learning process as well as the critical features that DNN has learned during the process. Toward this end, we propose to use several visualization techniques, including: 1) dimensionality reduction visualization, to project the high-dimensional feature space to the 2D space to facilitate visualization and interpretation, and 2) visual analytics and information visualization, to quantify relative contributions of each feature with the proposed feature manipulation procedures. The results provide insightful views and plausible explanations of the DNN in device-free Wi-Fi indoor localization using channel state information (CSI) fingerprints.
Self-powered intermittent systems waste considerable I/O energy because volatile I/O modules repeatedly issue identical operations under power failure conditions, and also due to the use of the inefficient I/O stack originally developed for battery-powered systems. This paper presents the concept, design, and implementation of autonomous I/O, which can accumulatively and transparently complete I/O operations regardless of power stability. We define its two essential functionalities, separate the general I/O stack to make accumulatively-completed I/O operations transparent to application tasks, and propose an access protocol that allows for energy efficiency and compatibility with the general I/O stack. To evaluate the efficacy, we implement our design and conduct extensive experiments on a Texas Instruments device with commodity sensor and Wi-Fi modules. Experimental results show that autonomous I/O can achieve 1.8 times the throughout achieved with nonvolatile I/O when the power is relatively steady, while reducing the completion time of individual I/O operations by at least 34% with relatively unstable power.
Graphics-intensive mobile games place different and varying levels of demand on the associated CPUs and GPUs. In contrast to the workload variability that characterizes games, the current design of the energy governor employed by mobile systems appears to be outdated. In this work, we review the energy-saving mechanism implemented in an Android system coupled with graphics-intensive gaming workloads from three perspectives: user perception, application status, and the interplay between the CPU and GPU. We observe that there are information gaps in the current system, which may result in unnecessary energy wastage. To resolve the problem, we propose an online user-centric CPU-GPU governing framework. To bridge the identified information gaps, we classify rendered game frames into redundant/changing frames to satisfy user demand, categorize an application into GPU sensitive/insensitive phases to understand the application’s demand, and determine the frequency scaling intents of the CPU and GPU to capture processor demand. In response to the measured demand, we employ a required workload estimator, a unified policy selector, and a frequency-scaling intent communicator in the framework to save energy. The proposed framework was implemented on an LG Nexus 5X smartphone, and extensive experiments with realworld 3D gaming applications were conducted. According to the experiment results, for an application which is low interactive and infrequent phase changing, the proposed framework can respectively reduce energy consumption by 25.3% and 39% compared with our previous work and Android governors while maintaining user experience.
Vehicular fog computing (VFC) is a promising approach to provide ultra-low-latency service to vehicles and end users by extending fog computing to the conventional vehicular networks. Parked vehicle assistance (PVA), as a critical technique in VFC, can be integrated with smart parking in order to exploit its full potentials. In this paper, we propose a smart VFC system, by combining both PVA and smart parking. A VFC-aware parking reservation auction is proposed to guide the on-the-move vehicles to the available parking places with less effort and meanwhile exploit the fog capability of parked vehicles to assist the delay-sensitive computing services by monetary rewards to compensate for their service cost. The proposed allocation rule maximizes the aggregate utility of the smart vehicles and the proposed payment rule guarantees incentive compatibility, individual rationality, and budget balance. We further provide an observation stage with dynamic offload pricing update to improve the offload efficiency and the profit of the fog system. The simulation results confirm the win–win performance enhancement to the fog node controller, the smart vehicles, and the parking places from the proposed design.
Self-powered intermittent systems enable accumulative executions in unstable power environments, where checkpointing is often adopted as a means to achieve data consistency and system recovery under power failures. However, existing approaches based on the checkpointing paradigm normally require system suspension and logging at runtime. This paper presents a design which enables failure-resilient intermittently-powered systems without runtime checkpointing. Our design enforces the consistency and serializability of concurrent data access while maximizing computation progress, as well as allows instant system recovery after power resumption, by leveraging the characteristics of data accessed in hybrid memory. We integrated the design into FreeRTOS running on a Texas Instruments device. Experimental results show that our design achieves up to 11.8 times the computation progress achieved by checkpointing-based approaches, while reducing the recovery time by nearly 90%.