:::
This paper introduces a new key space for CSIDH and a new algorithm for constant-time evaluation of the CSIDH group action. The key space is not useful with previous algorithms, and the algorithm is not useful with previous key spaces, but combining the new key space with the new algorithm produces speed records for constant-time CSIDH. For example, for CSIDH-512 with a 256-bit key space, the best previous constant-time results used 789000 multiplications and more than 200 million Skylake cycles; this paper uses 438006 multiplications and 125.53 million cycles.

Monte Carlo tree search (MCTS) has achieved state-of-the-artresults in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.
To exploit rich information from unlabeled data, in this work, wepropose  a  novel  self-supervised  framework  for  visual  trackingwhich can easily adapt the state-of-the-art supervised Siamese-based trackers into unsupervised ones by utilizing the fact thatan image and any cropped region of it can form a natural pairfor self-training. Besides common geometric transformation-baseddata augmentation and hard negative mining, we also propose ad-versarial masking which helps the tracker to learn other contextinformation by adaptively blacking out salient regions of the tar-get. The proposed approach can be trained offline using imagesonly without any requirement of manual annotations and tempo-ral information from multiple consecutive frames. Thus, it can beused with any kind of unlabeled data, including images and videoframes. For evaluation, we take SiamFC as the base tracker andname the proposed self-supervised method as푆2SiamFC. Extensiveexperiments and ablation studies on the challenging VOT2016 andVOT2018 datasets are provided to demonstrate the effectivenessof the proposed method which not only achieves comparable per-formance to its supervised counterpart and other unsupervisedmethods requiring multiple frames.
I n recent years, waveform-mapping-based speech enhancement (SE) methods have garnered significant attention. These methods generally use a deep learning model to directly process and reconstruct speech waveforms. Because both the input and output are in waveform format, the waveform-mapping-based SE methods can overcome the distortion caused by imperfect phase estimation, which may be encountered in spectral-mapping-based SE systems. So far, most waveform-mapping-based SE methods have focused on single-channel tasks. In this paper, we propose a novel fully convolutional network (FCN) with Sinc and dilated convolutional layers (termed SDFCN) for multichannel SE that operates in the time domain. We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN). The proposed methods are evaluated on three multichannel SE tasks, namely the dual-channel inner-ear microphones SE task, the distributed microphones SE task, and the CHiME-3 dataset. The experimental results confirm the outstanding denoising capability of the proposed SE systems on both tasks and the benefits of using the residual architecture on the overall SE performance.
In this paper, we propose a convolutional neural network (CNN) model for device-free fingerprinting indoor localization based on Wi-Fi channel state information (CSI). Besides, we develop an interpretation framework to understand the representations learned by the model. By quantifying and visualizing CNN in comparison with the fully-connected feedforward deep neural network (DNN) (or multilayer perceptron), we observe that each model can automatically identify location-specific patterns, which are however different across models and are linked to the respective performance of each model. Furthermore, we quantify how features, relevant or otherwise, as deemed by the adopted quantifying metrics (i.e., relevance scores, calculated by relevance propagation techniques), determine or affect the performance results. Interpretation of learning models for wireless applications is challenging due to the lack of human sensory intuition and reference. The results presented in this paper provide visually perceivable evidence and plausible explanations for the performance advantages of CNN in this important application.
In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearance path captures macroscopic vehicle features while the orientation conditioned part appearance path learns to capture localized discriminative features by focusing attention on the most informative key-points. Through extensive experimentation, we show that the proposed AAVER method is able to accurately re-identify vehicles in unconstrained scenarios, yielding state of the art results on the challenging dataset VeRi-776. As a byproduct, the proposed system is also able to accurately predict vehicle key-points and shows an improvement of more than 7% over state of the art.
Unconstrained video-based face recognition is a challenging problem due to significant within-video variations caused by pose, occlusion and blur. To tackle this problem, an effective idea is to propagate the identity from high-quality faces to low-quality ones through contextual connections, which are constructed based on context such as body appearance. However, previous methods have often propagated erroneous information due to lack of uncertainty modeling of the noisy contextual connections. In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph. UGG explicitly models the uncertainty of the contextual connections by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results in the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark dataset.
In-memory techniques keep data into faster and more expensive storage media for improving performance of big data processing. However, existing mechanisms do not consider how to expedite the data processing applications that access the input datasets only once. Another problem is how to reclaim memory without affecting other running applications. In this paper, we provide scheduling-aware data prefetching and eviction mechanisms based on Spark, Alluxio, and Hadoop. The mechanisms prefetch data and release memory resources based on the scheduling information. A mathematical method is proposed for maximizing the reduction of data access time. To make the mechanisms applicable in large-scale environments, we propose a heuristic algorithm to reduce the computational time. Furthermore, an enhanced version of the heuristic algorithm is also proposed to increase the amount of prefetched data. Finally, we perform real-testbed and simulation experiments to show the effectiveness of the proposed mechanisms.
Device-free indoor localization is a key enabling technology for many Internet of Things (IoT) applications. Deep neural network (DNN)-based location estimators achieve high-precision localization performance by automatically learning discriminative features from noisy wireless signals without much human intervention. However, the inner workings of DNN are not transparent and not adequately understood especially in wireless localization applications. In this paper, we conduct visual analyses of DNN-based location estimators trained with Wi-Fi channel state information (CSI) fingerprints in a real-world experiment. We address such questions as 1) how well has the DNN learned and been trained, and 2) what critical features has the DNN learned to distinguish different classes, via visualization techniques. The results provide plausible explanations and allow for a better understanding of the mechanism of DNN-based wireless indoor localization.