Deep neural network inference on energy harvesting tiny devices has emerged as a solution for sustainable edge intelligence. However, compact models optimized for continuously-powered systems may become suboptimal when deployed on intermittently-powered systems. This paper presents the pruning criterion, pruning strategy, and prototype implementation of iPrune, the first framework which introduces intermittency into neural network pruning to produce compact models adaptable to intermittent systems. The pruned models are deployed and evaluated on a Texas Instruments device with various power strengths and TinyML applications. Compared to an energy-aware pruning framework, iPrune can speed up intermittent inference by 1.1 to 2 times while achieving comparable model accuracy.
D4AM: A General Denoising Framework for Downstream Acoustic Models Chi-Chang Lee , Yu Tsao , Hsin-Min Wang , Chu-Song Chen Published: 02 Feb 2023, Last Modified: 06 Mar 2023 ICLR 2023 poster Readers: Everyone Show Bibtex Show Revisions Keywords: audio processing, speech enhancement, robust automatic speech recognition, auxiliary task learning TL;DR: We propose a general denoising framework for various downstream acoustic models (D4AM) by adopting an effective joint training scheme with the regression (denoising) objective and the classification (ASR) objective. Abstract: The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noise-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoising framework, D4AM, for various downstream acoustic models. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. In addition, our method aims to consider the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients rather than undergoing a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. Specifically, when evaluated on the Google ASR API with real noisy data completely unseen during SE training, D4AM achieves a relative WER reduction of 24.65% compared with the direct feeding of noisy input. To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.
Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with corresponding data conditions. Moreover, our theoretical calculation yields intuitive interpretations towards the knowledge transfer process. Subsequently, an alternative method for network-based transfer learning is derived. The method shows an increase in efficiency and accuracy for domain adaptation. It is particularly advantageous when new domain data is sufficiently sparse during adaptation. Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method.
Mobile live video streaming is expected to become mainstream in the fifth generation (5G) mobile networks. To boost the Quality of Experience (QoE) of streaming services, the integration of Scalable Video Coding (SVC) with Mobile Edge Computing (MEC) becomes a natural candidate due to its scalability and the reliable transmission supports for real-time interactions. However, it still takes efforts to integrate MEC into video streaming services to exploit its full potentials. We find that the efficiency of the MEC-enabled cellular system can be significantly improved when the requests of users can be redirected to proper MEC servers through optimal user associations. In light of this observation, we jointly address the caching placement, video quality decision, and user association problem in the live video streaming service. Since the proposed nonlinear integer optimization problem is NP-hard, we first develop a two-step approach from a Lagrangian optimization under the dual pricing specification. Further, to have a computation-efficient solution and less performance loss, we provide a one-step Lagrangian dual pricing algorithm by the convex transformation of non-convex constraints. The simulations show that the service quality of live video streaming can be remarkably enhanced by the proposed algorithms in the MEC-enabled cellular system.
Beam-based wireless power transfer and Fog/edge computing are promising dual technologies for realizing wireless powered Fog computing networks to support the upcoming B5G/6G IoT applications, which require latency-aware and intensive computing, with a limited energy supply. In such systems, IoT devices can either offload their computing tasks to the proximal Fog nodes or execute local computing with replenishing energy from the dedicated beamforming. However, effective integration of these techniques is still challenging, where two new issues arise: energy-aware task offloading and signal interferences from spillovers of wireless beamforming. In this paper, we observe that the beam-ripple phenomenon, which takes advantage of beamformer defects to transfer energy to IoT devices, is the key to jointly addressing these two issues. Different from traditional SWIPT technology, as in our approach the stream is not separately divided into data/energy streams, but target IoT devices can potentially harvest the whole stream. Inspired by this phenomenon, we treat the collaborative energy beamforming and edge computing design as a strongly NP -hard optimization problem. The proposed solution is an iterative algorithm to cascadingly integrate a polynomial-time (1−1e) -approximation algorithm, which achieves the theoretical upper bound in approximation ratio unless P=NP , and an optimal dynamic programming algorithm. The numerical results show that the energy minimization goal among IoT devices can achieve, and the developed harvest-when-interfered protocol is practical in the wireless powered Fog computing networks.
Energy harvesting creates an emerging intermittent computing paradigm, but poses new challenges for sophisticated applications such as intermittent deep neural network (DNN) inference. Although model compression has adapted DNNs to resource constrained devices, under intermittent power, compressed models will still experience multiple power failures during a single inference. Footprint-based approaches enable hardware accelerated intermittent DNN inference by tracking footprints, independent of model computations, to indicate accelerator progress across power cycles. However, we observe that the extra overhead required to preserve progress indicators can severely offset the computation progress accumulated by intermittent DNN inference. This work proposes the concept of model augmentation to adapt DNNs to intermittent devices. Our middleware stack, JAPARI, appends extra neural network components into a given DNN, to enable the accelerator to intrinsically integrate progress indicators into the inference process, without affecting model accuracy. Their specific positions allow progress indicator preservation to be piggybacked onto output feature preservation to amortize the extra overhead, and their assigned values ensure uniquely distinguishable progress indicators for correct inference recovery upon power resumption. Evaluations on a Texas Instruments device under various DNN models, capacitor sizes, and progress preservation granularities, show that JAPARI can speed up intermittent DNN inference by 3x over the state of the art, for common convolutional neural architectures that require heavy acceleration.
High-fidelity kinship face synthesis is a challenging task due to the limited amount of kinship data available for training and low-quality images. In addition, it is also hard to trace the genetic traits between parents and children from those low-quality training images. To address these issues, we leverage the pre-trained state-of-the-art face synthesis model, StyleGAN2, for kinship face synthesis. To handle large age, gender and other attribute variations between the parents and their children, we conduct a thorough study of its rich latent spaces and different encoder architectures for an optimized encoder design to repurpose StyleGAN2 for kinship face synthesis. The obtained latent representation from our developed encoder pipeline with stage-wise training strikes a better balance of editability and synthesis fidelity for identity preserving and attribute manipulations than other compared approaches. With extensive subjective, quantitative, and qualitative evaluations, the proposed approach consistently achieves better performance in terms of facial attribute heredity and image generation fidelity than other compared state-of-the-art methods. This demonstrates the effectiveness of the proposed approach which can yield promising and satisfactory kinship face synthesis using only a single and straightforward encoder architecture.
Deep neural network (DNN) inference on intermittently-powered battery-less devices has the potential to unlock new possibilities for sustainable and intelligent edge applications. Existing intermittent inference approaches preserve progress information separate from the computed output features during inference. However, we observe that even in highly specialized approaches, the additional overhead incurred for inference progress preservation still accounts for a significant portion of the inference latency. This work proposes the concept of stateful neural networks, which enables a DNN to indicate the inference progress itself. Our runtime middleware embeds state information into the DNN such that the computed and preserved output features intrinsically contain progress indicators, avoiding the need to preserve them separately. The specific position and representation of the embedded states jointly ensure both output features and states are not corrupted while maintaining model accuracy, and the embedded states allow the latest output feature to be determined, enabling correct inference recovery upon power resumption. Evaluations were conducted on different Texas Instruments devices under varied intermittent power strengths and network models. Compared to the state of the art, our approach can speed up intermittent inference by 1.3 to 5 times, achieving higher performance when executing modern convolutional networks with weaker power.
Internet-of-Things (IoT) devices are gradually adopting battery-less, energy harvesting solutions, thereby driving the development of an intermittent computing paradigm to accumulate computation progress across multiple power cycles. While many attempts have been made to enable standalone intermittent systems, little attention has focused on IoT networks formed by intermittent devices. We observe that the computation progress improved by \textit{distributed task concurrency} in an intermittent network can be significantly offset by data unavailability due to frequent system failures. This paper presents an intermittent-aware distributed concurrency control protocol which leverages existing data copies inherently created in the network to improve the computation progress of concurrently executed tasks. In particular, we propose a borrowing-based data management method to increase data availability and an intermittent two-phase commit procedure incorporated with distributed backward validation to ensure data consistency in the network. The proposed protocol was integrated into a FreeRTOS-extended intermittent operating system running on Texas Instruments devices. Experimental results show that the computation progress can be significantly improved, and this improvement is more apparent under weaker power, where more devices will remain offline for longer duration.
Mobile Edge Computing (MEC) is a promising technique in the 5G Era to improve the Quality of Experience (QoE) for online video streaming due to its ability to reduce the backhaul transmission by caching certain content. However, it still takes effort to address the user association and video quality selection problem under the limited resource of MEC to fully support the low-latency demand for live video streaming. We found the optimization problem to be a non-linear integer programming, which is impossible to obtain a globally optimal solution under polynomial time. In this paper, we formulate the problem and derive the closed-form solution in the form of Lagrangian multipliers; the searching of the optimal variables is formulated as a Multi-Arm Bandit (MAB) and we propose a Deep Deterministic Policy Gradient (DDPG) based algorithm exploiting the supply-demand interpretation of the Lagrange dual problem. Simulation results show that our proposed approach achieves significant QoE improvement, especially in the low wireless resource and high user number scenario compared to other baselines.