:::
State-of-the-art (SOTA) semi-supervised learning techniques, such as FixMatch and it's variants, have demonstrated impressive performance in classification tasks. However, these methods are not directly applicable to regression tasks. In this paper, we present RankUp, a simple yet effective approach that adapts existing semi-supervised classification techniques to enhance the performance of regression tasks. RankUp achieves this by converting the original regression task into a ranking problem and training it concurrently with the original regression objective. This auxiliary ranking classifier outputs a classification result, thus enabling integration with existing semi-supervised classification methods. Moreover, we introduce regression distribution alignment (RDA), a complementary technique that further enhances RankUp's performance by refining pseudo-labels through distribution alignment. Despite its simplicity, RankUp, with or without RDA, achieves SOTA results in across a range of regression benchmarks, including computer vision, audio, and natural language processing tasks.
Hepatocellular carcinoma (HCC), the most common type of liver cancer, poses significant challenges in detection and diagnosis. Medical imaging, especially computed tomography (CT), is pivotal in non-invasively identifying this disease, requiring substantial expertise for interpretation. This research introduces an innovative strategy that integrates two-dimensional (2D) and three-dimensional (3D) deep learning models within a federated learning (FL) framework for precise segmentation of liver and tumor regions in medical images. The study utilized 131 CT scans from the Liver Tumor Segmentation (LiTS) challenge and demonstrated the superior efficiency and accuracy of the proposed Hybrid-ResUNet model with a Dice score of 0.9433 and an AUC of 0.9965 compared to ResNet and EfficientNet models. This FL approach is beneficial for conducting large-scale clinical trials while safeguarding patient privacy across healthcare settings. It facilitates active engagement in problem-solving, data collection, model development, and refinement. The study also addresses data imbalances in the FL context, showing resilience and highlighting local models' robust performance. Future research will concentrate on refining federated learning algorithms and their incorporation into the continuous implementation and deployment (CI/CD) processes in AI system operations, emphasizing the dynamic involvement of clients. We recommend a collaborative human-AI endeavor to enhance feature extraction and knowledge transfer. These improvements are intended to boost equitable and efficient data collaboration across various sectors in practical scenarios, offering a crucial guide for forthcoming research in medical AI.
The utilization of face masks is an essential healthcare measure, particularly during times of pandemics, yet it can present challenges in communication in our daily lives. To address this problem, we propose a novel approach known as the human-in-the-loop StarGAN (HL–StarGAN) face-masked speech enhancement method. HL–StarGAN comprises discriminator, classifier, metric assessment predictor, and generator that leverages an attention mechanism. The metric assessment predictor, referred to as MaskQSS, incorporates human participants in its development and serves as a “human-in-the-loop” module during the learning process of HL–StarGAN. The overall HL–StarGAN model was trained using an unsupervised learning strategy that simultaneously focuses on the reconstruction of the original clean speech and the optimization of human perception. To implement HL–StarGAN, we created a face-masked speech database named “FMVD,” which comprises recordings from 34 speakers in three distinct face-masked scenarios and a clean condition. We conducted subjective and objective tests on the proposed HL–StarGAN using this database. The outcomes of the test results are as follows: (1) MaskQSS successfully predicted the quality scores of face-masked voices, outperforming several existing speech assessment methods. (2) The integration of the MaskQSS predictor enhanced the ability of HL–StarGAN to transform face-masked voices into high-quality speech; this enhancement is evident in both objective and subjective tests, outperforming conventional StarGAN and CycleGAN-based systems.
This letter explores energy efficiency (EE) maximization in a downlink multiple-input single-output (MISO) reconfigurable intelligent surface (RIS)-aided multiuser system employing rate-splitting multiple access (RSMA). The optimization task entails base station (BS) and RIS beamforming and RSMA common rate allocation with constraints. We propose a graph neural network (GNN) model that learns beamforming and rate allocation directly from the channel information using a unique graph representation derived from the communication system. The GNN model outperforms existing deep neural network (DNN) and model-based methods in terms of EE, demonstrating low complexity, resilience to imperfect channel information, and effective generalization across varying user numbers.
Mobile/multi-access edge computing (MEC) is developed to support the upcoming AI-aware mobile services, which require low latency and intensive computation resources at the edge of the network. One of the most challenging issues in MEC is service provision with mobility consideration. It has been known that the offloading decision and resource allocation need to be jointly handled to optimize the service provision efficiency within the latency constraints, which is challenging when users are in mobility. In this paper, we propose Mobility-Aware Deep Reinforcement Learning (M-DRL) framework for mobile service provision in the MEC system. M-DRL is composed of two parts: glimpse, a seq2seq model customized for mobility prediction to predict a sequence of locations just like a “glimpse” of the future, and a DRL specialized in supporting offloading decisions and resource allocation in MEC. By integrating the proposed DRL and glimpse mobility prediction model, the proposed M-DRL framework is optimized to handle the MEC service provision with average 70% performance improvements.
Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variational autoencoder (VQ-VAE). The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted. To further improve correlation with real quality scores, domain knowledge of speech processing is incorporated into the model design. We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training. To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced. In summary, the proposed speech quality estimation method and enhancement models require only clean speech for training without any label requirements. Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines.
Reconfigurable intelligent surface (RIS) is a revolutionary passive radio technique to facilitate capacity enhancement beyond the current massive multiple-input multiple-output (MIMO) transmission. However, the potential hardware impairment (HWI) of the RIS usually causes inevitable performance degradation and the amplification of imperfect CSI. These impacts still lack full investigation in the RIS-assisted wireless network. This paper developed a robust joint RIS and transceiver design algorithm to minimize the worst-case mean square error (MSE) of the received signal under the HWI effect and imperfect channel state information (CSI) in the RIS-assisted multi-user MIMO (MU-MIMO) wireless network. Specifically, since the proposed robust joint RIS and transceiver design problem yields non-convex characteristics under severe HWI, an iterative three-step convex algorithm is developed to approach the optimality by relaxation and convex transformation. Compared with the state-of-the-art baselines that ignore the HWI, the proposed robust algorithm inhibits the destruction of HWI while raising the worst-case MSE effectively in several numerical simulations. Moreover, due to the properties of the HWI, the performance loss is notable under the magnification of the number of reflected elements in the RIS-assisted MU-MIMO wireless network.
Vehicle-to-everything (V2X) communication is one of the key technologies of 5G New Radio to support emerging applications such as autonomous driving. Due to the high density of vehicles, Remote Radio Heads (RRHs) will be deployed as Road Side Units to support V2X. Nevertheless, activation of all RRHs during low-traffic off-peak hours may cause energy wasting. The proper activation of RRH and association between vehicles and RRHs while maintaining the required service quality are the keys to reducing energy consumption. In this work, we first formulate the problem as an Integer Linear Programming optimization problem and prove that the problem is NP-hard. Then, we propose two novel algorithms, referred to as “Least Delete (LD)” and ”Largest-First Rounding with Capacity Constraints (LFRCC).” The simulation results show that the proposed algorithms can achieve significantly better performance compared with existing solutions and are competitive with the optimal solution. Specifically, the LD and LFRCC algorithms can reduce the number of activated RRHs by 86 % and 89 % in low-density scenarios. In high-density scenarios, the LD algorithm can reduce the number of activated RRHs by 90 % . In addition, the solution of LFRCC is larger than that of the optimal solution within 7 % on average.
Dissecting low-level malware behaviors into human-readable reports, such as cyber threat intelligence, is time-consuming and requires expertise in systems and cybersecurity. This work combines dynamic analysis and artificial intelligence-generative transformation for malware report generation, providing detailed technical insights and articulating malware intentions.
Designing intelligent, tiny devices with limited memory is immensely challenging, exacerbated by the additional memory requirement of residual connections in deep neural networks. In contrast to existing approaches that eliminate residuals to reduce peak memory usage at the cost of significant accuracy degradation, this paper presents DERO, which reorganizes residual connections by leveraging insights into the types and interdependencies of operations across residual connections. Evaluations were conducted across diverse model architectures designed for common computer vision applications. DERO consistently achieves peak memory usage comparable to plain-style models without residuals, while closely matching the accuracy of the original models with residuals.