Hepatocellular carcinoma (HCC), the most common type of liver cancer, poses significant challenges in detection and diagnosis. Medical imaging, especially computed tomography (CT), is pivotal in non-invasively identifying this disease, requiring substantial expertise for interpretation. This research introduces an innovative strategy that integrates two-dimensional (2D) and three-dimensional (3D) deep learning models within a federated learning (FL) framework for precise segmentation of liver and tumor regions in medical images. The study utilized 131 CT scans from the Liver Tumor Segmentation (LiTS) challenge and demonstrated the superior efficiency and accuracy of the proposed Hybrid-ResUNet model with a Dice score of 0.9433 and an AUC of 0.9965 compared to ResNet and EfficientNet models. This FL approach is beneficial for conducting large-scale clinical trials while safeguarding patient privacy across healthcare settings. It facilitates active engagement in problem-solving, data collection, model development, and refinement. The study also addresses data imbalances in the FL context, showing resilience and highlighting local models' robust performance. Future research will concentrate on refining federated learning algorithms and their incorporation into the continuous implementation and deployment (CI/CD) processes in AI system operations, emphasizing the dynamic involvement of clients. We recommend a collaborative human-AI endeavor to enhance feature extraction and knowledge transfer. These improvements are intended to boost equitable and efficient data collaboration across various sectors in practical scenarios, offering a crucial guide for forthcoming research in medical AI.
Dissecting low-level malware behaviors into human-readable reports, such as cyber threat intelligence, is time-consuming and requires expertise in systems and cybersecurity. This work combines dynamic analysis and artificial intelligence-generative transformation for malware report generation, providing detailed technical insights and articulating malware intentions.
In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the ℓ1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.
Sparse grid imputation (SGI) is a challenging problem, as its goal is to infer the values of the entire grid from a limited number of cells with values. Traditionally, the problem is solved using regression methods such as KNN and kriging, whereas in the real world, there is often extra information---usually imprecise---that can aid inference and yield better performance. In the SGI problem, in addition to the limited number of fixed grid cells with precise target domain values, there are contextual data and imprecise observations over the whole grid. To solve this problem, we propose a distribution estimation theory for the whole grid and realize the theory {via the composition architecture of the Target-Embedding and the Contextual CycleGAN} trained with contextual information and imprecise observations. Contextual CycleGAN is structured as two generator-discriminator pairs and uses different types of contextual loss to guide the training. We consider the real-world problem of fine-grained PM2.5 inference with realistic settings: a few (less than 1$\%$) grid cells with precise PM2.5 data and all grid cells with contextual information concerning weather and imprecise observations from satellites and microsensors. The task is to infer reasonable values for all grid cells. As there is no ground truth for empty cells, out-of-sample MSE (mean squared error) and JSD (Jensen--Shannon divergence) measurements are used in the empirical study. The results show that Contextual CycleGAN supports the proposed theory and outperforms the methods used for comparison.
This paper presents APILI, an innovative approach to behavior-based malware analysis that utilizes deep learning to locate the API calls corresponding to discovered malware techniques in dynamic execution traces. APILI defines multiple attentions between API calls, resources, and techniques, incorporating MITRE ATT\&CK framework, adversary tactics, techniques and procedures, through a neural network. We employ fine-tuned BERT for arguments/resources embedding, SVD for technique representation, and several design enhancements, including layer structure and noise addition, to improve the locating performance. To the best of our knowledge, this is the first attempt to locate low-level API calls that correspond to high-level malicious behaviors (that is, techniques). Our evaluation demonstrates that APILI outperforms other traditional and machine learning techniques in both technique discovery and API locating. These results indicate the promising performance of APILI, thus allowing it to reduce the analysis workload.
In this paper, we show how to use the Matrix Code Equiv alence (MCE) problem as a new basis to construct signature schemes. This extends previous work on using isomorphism problems for signature schemes, a trend that has recently emerged in post-quantum cryptogra phy. Our new formulation leverages a more general problem and allows for smaller data sizes, achieving competitive performance and great flex ibility. Using MCE, we construct a zero-knowledge protocol which we turn into a signature scheme named Matrix Equivalence Digital Sig nature (MEDS). We provide an initial choice of parameters for MEDS, tailored to NIST’s Category 1 security level, yielding public keys as small as 2 . 8 kB and signatures ranging from 18 kB to just around 6 . 5 kB, along with a reference implementation in C.
This paper proposes an encoder-decoder architecture for kidney segmentation. A hyperparameter optimization process is implemented, including the development of a model architecture, selecting a windowing method and a loss function, and data augmentation. The model consists of EfficientNet-B5 as the encoder and a feature pyramid network as the decoder that yields the best performance with a Dice score of 0.969 on the 2019 Kidney and Kidney Tumor Segmentation Challenge dataset. The proposed model is tested with different voxel spacing, anatomical planes, and kidney and tumor volumes. Moreover, case studies are conducted to analyze segmentation outliers. Finally, five-fold cross-validation and the 3D-IRCAD-01 dataset are used to evaluate the developed model in terms of the following evaluation metrics: the Dice score, recall, precision, and the Intersection over Union score. A new development and application of artificial intelligence algorithms to solve image analysis and interpretation will be demonstrated in this paper. Overall, our experiment results show that the proposed kidney segmentation solutions in CT images can be significantly applied to clinical needs to assist surgeons in surgical planning. It enables the calculation of the total kidney volume for kidney function estimation in ADPKD and supports radiologists or doctors in disease diagnoses and disease progression.
Previously, doctors interpreted computed tomography (CT) images based on their experience in diagnosing kidney diseases. However, with the rapid increase in CT images, such interpretations were required considerable time and effort, producing inconsistent results. Several novel neural network models were proposed to automatically identify kidney or tumor areas in CT images for solving this problem. In most of these models, only the neural network structure was modified to improve accuracy. However, data pre-processing was also a crucial step in improving the results. This study systematically discussed the necessary pre-processing methods before processing medical images in a neural network model. The experimental results were shown that the proposed pre-processing methods or models significantly improve the accuracy rate compared with the case without data pre-processing. Specifically, the dice score was improved from 0.9436 to 0.9648 for kidney segmentation and 0.7294 for all types of tumor detections. The performance was suitable for clinical applications with lower computational resources based on the proposed medical image processing methods and deep learning models. The cost efficiency and effectiveness were also achieved for automatic kidney volume calculation and tumor detection accurately.
This paper introduces a new key space for CSIDH and a new algorithm for constant-time evaluation of the CSIDH group action. The key space is not useful with previous algorithms, and the algorithm is not useful with previous key spaces, but combining the new key space with the new algorithm produces speed records for constant-time CSIDH. For example, for CSIDH-512 with a 256-bit key space, the best previous constant-time results used 789000 multiplications and more than 200 million Skylake cycles; this paper uses 438006 multiplications and 125.53 million cycles.
In-memory techniques keep data into faster and more expensive storage media for improving performance of big data processing. However, existing mechanisms do not consider how to expedite the data processing applications that access the input datasets only once. Another problem is how to reclaim memory without affecting other running applications. In this paper, we provide scheduling-aware data prefetching and eviction mechanisms based on Spark, Alluxio, and Hadoop. The mechanisms prefetch data and release memory resources based on the scheduling information. A mathematical method is proposed for maximizing the reduction of data access time. To make the mechanisms applicable in large-scale environments, we propose a heuristic algorithm to reduce the computational time. Furthermore, an enhanced version of the heuristic algorithm is also proposed to increase the amount of prefetched data. Finally, we perform real-testbed and simulation experiments to show the effectiveness of the proposed mechanisms.