:::
Image deblurring aims to restore the latent sharp images from the corresponding blurred ones. In this paper, we present an unsupervised method for domain-specific single-image deblurring based on disentangled representations. The disentanglement is achieved by splitting the content and blur features in a blurred image using content encoders and blur encoders. We enforce a KL divergence loss to regularize the distribution range of extracted blur attributes such that little content information is contained. Meanwhile, to handle the unpaired training data, a blurring branch and the cycle-consistency loss are added to guarantee that the content structures of the deblurred results match the original images. We also add an adversarial loss on deblurred results to generate visually realistic images and a perceptual loss to further mitigate the artifacts. We perform extensive experiments on the tasks of face and text deblurring using both synthetic datasets and real images, and achieve improved results compared to recent state-of-the-art deblurring methods.
Graphics-intensive mobile games place different and varying levels of demand on the associated CPUs and GPUs. In contrast to the workload variability that characterizes games, the current design of the energy governor employed by mobile systems appears to be outdated. In this work, we review the energy-saving mechanism implemented in an Android system coupled with graphics-intensive gaming workloads from three perspectives: user perception, application status, and the interplay between the CPU and GPU. We observe that there are information gaps in the current system, which may result in unnecessary energy wastage. To resolve the problem, we propose an online user-centric CPU-GPU governing framework. To bridge the identified information gaps, we classify rendered game frames into redundant/changing frames to satisfy user demand, categorize an application into GPU sensitive/insensitive phases to understand the application’s demand, and determine the frequency scaling intents of the CPU and GPU to capture processor demand. In response to the measured demand, we employ a required workload estimator, a unified policy selector, and a frequency-scaling intent communicator in the framework to save energy. The proposed framework was implemented on an LG Nexus 5X smartphone, and extensive experiments with realworld 3D gaming applications were conducted. According to the experiment results, for an application which is low interactive and infrequent phase changing, the proposed framework can respectively reduce energy consumption by 25.3% and 39% compared with our previous work and Android governors while maintaining user experience.
Vehicular fog computing (VFC) is a promising approach to provide ultra-low-latency service to vehicles and end users by extending fog computing to the conventional vehicular networks. Parked vehicle assistance (PVA), as a critical technique in VFC, can be integrated with smart parking in order to exploit its full potentials. In this paper, we propose a smart VFC system, by combining both PVA and smart parking. A VFC-aware parking reservation auction is proposed to guide the on-the-move vehicles to the available parking places with less effort and meanwhile exploit the fog capability of parked vehicles to assist the delay-sensitive computing services by monetary rewards to compensate for their service cost. The proposed allocation rule maximizes the aggregate utility of the smart vehicles and the proposed payment rule guarantees incentive compatibility, individual rationality, and budget balance. We further provide an observation stage with dynamic offload pricing update to improve the offload efficiency and the profit of the fog system. The simulation results confirm the win–win performance enhancement to the fog node controller, the smart vehicles, and the parking places from the proposed design.
Covariates are factors that have a debilitating influence on face verification performance. In this paper, we comprehensively study two covariate related problems for unconstrained face verification: first, how covariates affect the performance of deep neural networks on the large-scale unconstrained face verification problem; second, how to utilize covariates to improve verification performance. To study the first problem, we implement five state-of-the-art deep convolutional networks and evaluate them on three challenging covariates datasets. In total, seven covariates are considered: pose (yaw and roll), age, facial hair, gender, indoor/outdoor, occlusion (nose and mouth visibility, and forehead visibility), and skin tone. We first report the performance of each individual network on the overall protocol and use the score-level fusion method to analyze each covariate. Some of the results confirm and extend the findings of previous studies, and others are new findings that were rarely mentioned previously or did not show consistent trends. For the second problem, we demonstrate that with the assistance of gender information, the quality of a precurated noisy large-scale face dataset for face recognition can be further improved. After retraining the face recognition model using the curated data, performance improvement is observed at low false acceptance rates.
Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another (e.g., synthetic to real images). The adapted representations often do not capture pixel-level domain shifts that are crucial for dense prediction tasks (e.g., semantic segmentation). In this paper, we present a novel pixel-wise adversarial domain adaptation algorithm. By leveraging image-to-image translation methods as a data augmentation technique, our key insight is that while the translated images between domains may differ in styles, their predictions for the task should be exactly the same. We exploit this property and introduce a cross-domain consistency loss that enforces our adapted model to produce consistent predictions. We validate our method through a wide variety of unsupervised domain adaptation tasks, including synthetic-to-real semantic segmentation, optical flow estimation, and depth prediction as well as adapting semantic segmentation models across different cities. Extensive experimental results demonstrate that the proposed approach performs favorably against the state-of-the-arts.
In this paper, we address a new task called instance co-segmentation. Given a set of images jointly covering object instances of a specific category, instance co-segmentation aims to identify all of these instances and segment each of them, i.e. generating one mask for each instance. This task is important since instance-level segmentation is preferable for humans and many vision applications. It is also challenging because no pixel-wise annotated training data are available and the number of instances in each image is unknown. We solve this task by dividing it into two sub-tasks, co-peak search and instance mask segmentation. In the former sub-task, we develop a CNN-based network to detect the co-peaks as well as co-saliency maps for a pair of images. A co-peak has two endpoints, one in each image, that are local maxima in the response maps and similar to each other. Thereby, the two endpoints are potentially covered by a pair of instances of the same category. In the latter subtask, we design a ranking function that takes the detected co-peaks and co-saliency maps as inputs and can select the object proposals to produce the final results. Our method for instance co-segmentation and its variant for object colocalization are evaluated on four datasets, and achieve favorable performance against the state-of-the-art methods. The source codes and the collected datasets are available at https://github.com/KuangJuiHsu/DeepCO3/.
This paper proposes a method for head pose estimation from a single image. Previous methods often predict head poses through landmark or depth estimation and would require more computation than necessary. Our method is based on regression and feature aggregation. For having a compact model, we employ the soft stagewise regression scheme. Existing feature aggregation methods treat inputs as a bag of features and thus ignore their spatial relationship in a feature map. We propose to learn a fine-grained structure mapping for spatially grouping features before aggregation. The fine-grained structure provides part-based information and pooled values. By utilizing learnable and non-learnable importance over the spatial location, different variant models as a complementary ensemble can be generated. Experiments show that our method outperforms the state-of-the-art methods including both the landmark free ones and the ones based on landmark or depth estimation. Based on a single RGB frame as input, our method even outperforms methods utilizing multi-modality information (RGB-D, RGB-Time) on estimating the yaw angle. Furthermore, the memory overhead of the proposed model is 100 smaller than that of previous methods.
Perceptual similarity measurement allows mobile applications to eliminate unnecessary computations without compromising visual experience. Existing pixel-wise measures incur significant overhead with increasing display resolutions and frame rates. This paper presents an ultra lightweight similarity measure called LSIM, which assesses the similarity between frames based on the transformation matrices of graphics objects. To evaluate its efficacy, we integrate LSIM into the Open Graphics Library and conduct experiments on an Android smartphone with various mobile 3D games. The results show that LSIM is highly correlated with the most widely used pixel-wise measure SSIM, yet three to five orders of magnitude faster. We also apply LSIM to a CPU-GPU governor to suppress the rendering of similar frames, thereby further reducing computation energy consumption by up to 27.3% while maintaining satisfactory visual quality.
Self-powered intermittent systems enable accumulative executions in unstable power environments, where checkpointing is often adopted as a means to achieve data consistency and system recovery under power failures. However, existing approaches based on the checkpointing paradigm normally require system suspension and logging at runtime. This paper presents a design which enables failure-resilient intermittently-powered systems without runtime checkpointing. Our design enforces the consistency and serializability of concurrent data access while maximizing computation progress, as well as allows instant system recovery after power resumption, by leveraging the characteristics of data accessed in hybrid memory. We integrated the design into FreeRTOS running on a Texas Instruments device. Experimental results show that our design achieves up to 11.8 times the computation progress achieved by checkpointing-based approaches, while reducing the recovery time by nearly 90%.
For rate optimization in interference limited network, improper Gaussian signaling has shown its capability to outperform the conventional proper Gaussian signaling. In this work, we study a weighted sum-rate maximization problem with improper Gaussian signaling for the multiple-input multiple-output interference broadcast channel (MIMO-IBC). To solve this nonconvex and NP-hard problem, we propose an effective separate covariance and pseudo-covariance matrices optimization algorithm. In the covariance optimization, a weighted minimum mean square error (WMMSE) algorithm is adopted, and, in the pseudo-covariance optimization, an alternating optimization (AO) algorithm is proposed, which guarantees convergence to a stationary solution and ensures a sum-rate improvement over proper Gaussian signaling. An alternating direction method of multipliers (ADMM)-based multi-agent distributed algorithm is proposed to solve an AO subproblem with the globally optimal solution in a parallel and scalable fashion. The proposed scheme exhibits favorable convergence, optimality, and complexity properties for future large-scale networks. Simulation results demonstrate the superior sum-rate performance of the proposed algorithm as compared to existing schemes with proper as well as improper Gaussian signaling under various network configurations.