:::
Graph pooling has gained significant progress in recent years as an effective solution for graph-level property classification tasks. With the emergence of research on Heterogeneous Information Networks (HINs), this paper argues that graph-level datasets for graph classification should be treated as HINs rather than homogeneous graphs to enhance information aggregation. We propose HINPool, a novel and general graph pooling framework for graph-level property classification with HINs. First, we devise a systematic HIN construction procedure from the original data to capture complex interactions. Next, we introduce a type-aware heterogeneous graph pooling method featuring a Type-Aware Selector (TAS) to select essential nodes and a Readout Aggregator (RA) to fuse critical information into a graph-level representation. Finally, a cross-layer fusion function is applied to combine the output embeddings from each graph pooling layer, creating a final graph representation for downstream classification tasks. Our approach achieves near state-of-the-art performance on widely used graph classification benchmark datasets, demonstrating significant improvements in four out of five datasets. This work redefines the strategy for graph-level property classification with HGNNs and heterogeneous graph pooling to model intricate relationships, enhancing performance without requiring extensive domain-specific knowledge.
High-performance and high-precision flow monitoring is a crucial function for network management, network bandwidth usage accounting and billing, network security, network forensics, and other important tasks. Nowadays, many commercial switches/routers provide either sFlow, NetFlow, or IPFIX scheme for monitoring the flows traversing a network. sFlow is a scheme widely supported by many switches/routers due to its using a sampling-based method, which greatly reduces the CPU processing load on a switch/router and the network bandwidth required to transmit flow data to a remote collector. However, many small flows may go undetected and the estimated flow data (e.g., the packet count and byte count) for detected flows can significantly deviate from their ground truth. NetFlow, which is Cisco Systems’ proprietary technology, does not use a sampling-based method by default. Instead, it tries to collect complete and correct flow data for every flow. However, as the link speed and the flow arrival rate continue to increase, NetFlow also provides a sampling-based option to reduce the CPU utilization of the switch/router. Because NetFlow is proprietary, an Internet Engineering Task Force (IETF) working group has defined IPFIX as an open flow information export protocol based on NetFlow Version 9. The requirements for IPFIX are defined in the RFC 3917 standards. Basically, IPFIX is the same as NetFlow Version 9. Due to its high demand on the CPU of the switch/router, currently NetFlow is supported only on very high-end switches/routers and its design and implementation on these commercial switches/routers are not published in the literature. In this paper, we design and implement a high-performance and high-precision NetFlow/IPFIX system on a Programming Protocol-independent Packet Processors (P4) hardware switch. Based on a 20 Gbps playback of a packet trace gathered on an Internet backbone link, experimental results show that our novel method significantly outperforms the typical design and implementation method of NetFlow/IPFIX on a P4 hardware switch. For example, for the number of detected flows during the trace period, our method outperforms the typical method by a factor of 5.72. As for the number of flows whose packet and byte counts are correctly counted, our method outperforms the typical method by a factor of 8.57.
The fusion of tiny energy harvesting devices with deep neural networks (DNN) optimized for intermittent execution is vital for sustainable intelligent applications at the edge. However, current intermittent-aware neural architecture search (NAS) frameworks overlook the inherent intermittency management overhead (IMO) of DNNs, leading to under-performance upon deployment. Moreover, we observe that straightforward IMO minimization within NAS may degrade solution accuracy. This work explores the relationship between DNN architectural characteristics, IMO, and accuracy, uncovering the varying sensitivity toward IMO across different DNN characteristics. Inspired by our insights, we present two guidelines for leveraging IMO sensitivity in NAS. First, the overall architecture search space can be reduced to exclude parameters with low IMO sensitivity, and second, network blocks with high IMO sensitivity can be primarily focused during the search, facilitating the discovery of highly accurate networks with low IMO. We incorporate these guidelines into TiNAS, which integrates cutting-edge tiny NAS and intermittent-aware NAS frameworks. Evaluations are conducted across various datasets and latency requirements, as well as deployment experiments on a Texas Instruments device under different intermittent power profiles. Compared to two variants, one minimizing IMO and the other disregarding IMO, TiNAS, respectively, achieves up to 38% higher accuracy and 33% lower IMO, with greater improvements for larger datasets. Its deployed solutions also achieve up to a 1.33 times inference speedup, especially under fluctuating power conditions.
Tiny battery-free devices running deep neural networks (DNNs) embody intermittent TinyML, a paradigm at the intersection of intermittent computing and deep learning, bringing sustainable intelligence to the extreme edge. This paper, as an overview of a special session at Embedded Systems Week (ESWEEK) 2025, presents four tales from diverse research backgrounds, sharing experiences in addressing unique challenges of efficient and reliable DNN inference despite the intermittent nature of ambient power. The first explores enhancing inference engines for efficient progress accumulation in hardware-accelerated intermittent inference and designing networks tailored for such execution. The second investigates computationally light, adaptive algorithms for faster, energy-efficient inference, and emerging computing-in-memory architectures for power failure resiliency. The third addresses battery-free networking, focusing on timely neighbor discovery and maintaining synchronization despite spatio-temporal energy dynamics across nodes. The fourth leverages modern nonvolatile memory fault behavior and DNN robustness to save energy without significant accuracy loss, with applicability to intermittent inference on nano-satellites. Collectively, these early efforts advance intermittent TinyML research and promote future cross-domain collaboration to tackle open challenges.
Guaranteeing reliable deep neural network (DNN) inference despite intermittent power is the cornerstone of enabling intelligent systems in energy-harvesting environments. Existing intermittent inference approaches support static neural networks with deterministic execution characteristics, accumulating progress across power cycles. However, dynamic neural networks adapt their structures at runtime. We observe that because intermittent inference approaches are unaware of this non-deterministic execution behavior, they suffer from incorrect progress recovery, degrading inference accuracy and performance. This work proposes non-deterministic inference progress accumulation to enable dynamic neural network inference on intermittent systems. Our middleware, NodPA, realizes this methodology by strategically selecting additional progress information to capture the non-determinism of the power-interrupted computation while preserving only the changed portions of the progress information to maintain low runtime overhead. Evaluations are conducted on a Texas Instruments device with both static and dynamic neural networks under time-varying power sources. Compared to intermittent inference approaches reliant on determinism, NodPA is less prone to inference non-termination and achieves an average inference speedup of 1.57 times without compromising accuracy, with greater improvements for highly dynamic networks under weaker power.
Low earth orbit (LEO) satellite-enabled orthogonal frequency division multiple access (OFDMA) systems will play a pivotal role in future integrated satellite-terrestrial networks to realize ubiquitous high-throughput communication. However, the high mobility of LEO satellites and the utilization of Ku-Ka and millimeter wave (mmWave) bands introduce wide-range Doppler shifts, which are especially detrimental to OFDMA-based systems. Existing Doppler shift compensation methods are limited by the requirement for prior user location information and/or high computational complexities associated with searching across broad Doppler shift ranges. In this work, we propose a multi-stage Doppler shift compensation method aimed at compensating for wide-range Doppler shifts in downlink LEO satellite OFDMA systems over Ku-Ka to mmWave bands. The proposed method consists of three stages: incorporating the phase-differential (PD) operation into the extended Kalman filter (EKF) to widen the estimation range, enhancing compensation using a repetition training sequence, and utilizing the cyclic prefix (CP) for fine estimation. Simulation results demonstrate the proposed method's effectiveness in handling Doppler shifts in LEO SatCom over different channels and frequency bands. Moreover, the proposed method attains the maximum estimation range and exhibits high accuracy with low complexity, irrespective of the Doppler shift range, making it an effective, practical, and easily implementable solution in LEO satellite communication.
With the advances of machine learning, edge computing, and wireless communications, split inference has tracked more and more attention as a versatile inference paradigm. Split inference is essential to accelerate large-scale deep neural network (DNN) inference on resource-limited edge devices through partitioning a DNN between the edge device and the cloud server with advanced wireless communications such as B5G/6G and WiFi 6. We investigate the U-shape partitioning inference system, where both the input raw data and output inference results are kept on the edge device. We use image semantic segmentation as an exemplary application in our experiments. The experiment results showed that an honest-but-curious (HbC) server can launch the bidirectional privacy attack to reconstruct the raw data and steal the inference results, even when only the middle-end partition of the model is visible. To ensure bidirectional privacy and user experience on the U-shape partitioning inference system, a privacy and latency-aware partitioning strategy is needed to balance the trade-off between service latency and data privacy. We compared our proposed framework to other inference paradigms, including conventional split inference and inferencing entirely on the edge device or the server. We analyzed their inference latencies in various wireless technologies and quantitatively measured their level of privacy protection. The experiment results show that the U-shape partitioning inference system is advantageous over inference entirely on the edge device or the server.
This letter explores energy efficiency (EE) maximization in a downlink multiple-input single-output (MISO) reconfigurable intelligent surface (RIS)-aided multiuser system employing rate-splitting multiple access (RSMA). The optimization task entails base station (BS) and RIS beamforming and RSMA common rate allocation with constraints. We propose a graph neural network (GNN) model that learns beamforming and rate allocation directly from the channel information using a unique graph representation derived from the communication system. The GNN model outperforms existing deep neural network (DNN) and model-based methods in terms of EE, demonstrating low complexity, resilience to imperfect channel information, and effective generalization across varying user numbers.
Mobile/multi-access edge computing (MEC) is developed to support the upcoming AI-aware mobile services, which require low latency and intensive computation resources at the edge of the network. One of the most challenging issues in MEC is service provision with mobility consideration. It has been known that the offloading decision and resource allocation need to be jointly handled to optimize the service provision efficiency within the latency constraints, which is challenging when users are in mobility. In this paper, we propose Mobility-Aware Deep Reinforcement Learning (M-DRL) framework for mobile service provision in the MEC system. M-DRL is composed of two parts: glimpse, a seq2seq model customized for mobility prediction to predict a sequence of locations just like a “glimpse” of the future, and a DRL specialized in supporting offloading decisions and resource allocation in MEC. By integrating the proposed DRL and glimpse mobility prediction model, the proposed M-DRL framework is optimized to handle the MEC service provision with average 70% performance improvements.
Reconfigurable intelligent surface (RIS) is a revolutionary passive radio technique to facilitate capacity enhancement beyond the current massive multiple-input multiple-output (MIMO) transmission. However, the potential hardware impairment (HWI) of the RIS usually causes inevitable performance degradation and the amplification of imperfect CSI. These impacts still lack full investigation in the RIS-assisted wireless network. This paper developed a robust joint RIS and transceiver design algorithm to minimize the worst-case mean square error (MSE) of the received signal under the HWI effect and imperfect channel state information (CSI) in the RIS-assisted multi-user MIMO (MU-MIMO) wireless network. Specifically, since the proposed robust joint RIS and transceiver design problem yields non-convex characteristics under severe HWI, an iterative three-step convex algorithm is developed to approach the optimality by relaxation and convex transformation. Compared with the state-of-the-art baselines that ignore the HWI, the proposed robust algorithm inhibits the destruction of HWI while raising the worst-case MSE effectively in several numerical simulations. Moreover, due to the properties of the HWI, the performance loss is notable under the magnification of the number of reflected elements in the RIS-assisted MU-MIMO wireless network.