:::
Depth map super-resolution is an emerging topic due to the increasing needs and applications using RGB-D sensors. Together with the color image, the corresponding range data provides additional information and makes visual analysis tasks more tractable. However, since the depth maps captured by such sensors are typically with limited resolution, it is preferable to enhance its resolution for improved recognition. In this paper, we present a novel joint trilateral filtering (JTF) algorithm for solving depth map super-resolution (SR) problems. Inspired by bilateral filtering, our JTF utilizes and preserves edge information from the associated high-resolution (HR) image by taking spatial and range information of local pixels. Our proposed further integrates local gradient information of the depth map when synthesizing its HR output, which alleviates textural artifacts like edge discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.
With the increasing variety of mobile applications, reducing the energy consumption of mobile devices is a major challenge in sustaining multimedia streaming applications. This paper explores how to minimize the energy consumption of the backlight when displaying a video stream without adversely impacting the user's visual experience. First, we model the problem as a dynamic backlight scaling optimization problem. Then, we propose algorithms to solve the fundamental problem and prove the optimality in terms of energy savings. Finally, based on the algorithms, we present a cloud-based energy-saving service. We have also developed a prototype implementation integrated with existing video streaming applications to validate the practicability of the approach. The results of experiments conducted to evaluate the efficacy of the proposed approach are very encouraging and show energy savings of 15-49 percent on commercial mobile devices.
Coexistence of multiple radio access technologies (RATs) is a promising paradigm to improve spectral efficiency. This letter presents a game-theoretic network selection scheme in a cognitive heterogeneous networking environment with timevarying channel availability. We formulate the network selection problem as a noncooperative game with secondary users (SUs) as the players, and show that the game is an ordinal potential game (OPG). A decentralized, stochastic learning-based algorithm is proposed where each SU's strategy progressively evolves toward the Nash equilibrium (NE) based on its own action-reward history, without the need to know actions in other SUs. The convergence properties of the proposed algorithm toward an NE point are theoretically and numerically verified. The proposed algorithm demonstrates good throughput and fairness performances in various network scenarios.
Cross-domain image synthesis and recognition are typically considered as two distinct tasks in the areas of computer vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be easily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space learning. The proposed learning model not only observes a common feature space for associating cross-domain image data for recognition purposes, the derived feature space is able to jointly update the dictionaries in each image domain for improved representation. This is why our method can be applied to both cross-domain image synthesis and recognition problems. Experiments on a variety of synthesis and recognition tasks such as single image super-resolution, cross-view action recognition, and sketch-to-photo face recognition would verify the effectiveness of our proposed learning model.
Inspired by the recent success of low-rank matrix recovery, we propose a novel incremental learning algorithm based on low-rank matrix decomposition. Our proposed algorithm can be applied for solving background removal problems from static yet time-varying scenes. And, in this paper, we particularly consider background modeling for railroad crossing videos. The success of an adaptive background modeling/removal approach like ours will allow users to automatically perform foreground (or intruder) detection on such scenes, which would prevent possible vehicle-train collisions and thus significantly reduce the fatality or injury rates. The challenges of background modeling in railroad crossing videos not only involve environmental variations like lighting or weather changes, headlight reflection on rails caused by nearby vehicles and foreground objects with very different velocities (e.g., vehicle, bikes, or pedestrian) also make background removal of such real-world scenes extremely difficult. We will verify that our proposed algorithm exhibits sufficient effectiveness and robustness in solving this problem. Our experiments on real-world video data would confirm that, while our approach outperforms baseline or state-of-the-art background modeling methods, our computation cost is significantly lower than that of standard low-rank based algorithm.
TBA
A scientific understanding of emotion experience requires information on the contexts in which the emotion is induced. Moreover, as one of the primary functions of music is to regulate the listener's mood, the individual's short-term music preference may reveal the emotional state of the individual. In light of these observations, this paper presents the first scientific study that exploits the online repository of social data to investigate the connections between a blogger's emotional state, user context manifested in the blog articles, and the content of the music titles the blogger attached to the post. A number of computational models are developed to evaluate the accuracy of different content or context cues in predicting emotional state, using 40,000 pieces of music listening records collected from the social blogging website LiveJournal. Our study shows that it is feasible to computationally model the latent structure underlying music listening and mood regulation. The average area under the receiver operating characteristic curve (AUC) for the content-based and context-based models attains 0.5462 and 0.6851, respectively. The association among user mood, music emotion, and individual's personality is also identified.
In this paper, we address the problem of robust face recognition using single sample per person. Given only one training image per subject of interest, our proposed method is able to recognize query images with illumination or expression changes, or even the corrupted ones due to occlusion. In order to model the above intra-class variations, we advo- cate the use of external data (i.e., images of subjects not of interest) for learning an exemplar-based dictionary. This dictionary provides auxiliary yet representative information for handling intra-class variation, while the gallery set con- taining one training image per class preserves separation between dierent subjects for recognition purposes. Our ex- periments on two face datasets conrm the eectiveness and robustness of our approach, which is shown to outperform state-of-the-art sparse representation based methods.
In physical-layer security, secret bits are extracted from wireless channels. With the assumption of channel reciprocity, the legitimate users share the same channel which is independent of the channels between the legitimate users and the eavesdropper, leading to secure transmissions. However, practical implementation of the physical layer security faces many challenges. First, for the correlated channel such as the multiple-input and multiple-output (MIMO) channel, the security is decreased due to the correlation between the generated secret bits. Second, the nearby eavesdropper posts a security threat due to observing the same channel as the legitimate user's. Third, the eavesdroppers might try to reconstruct the wireless environments. In this paper, we propose two practical physical layer security schemes for the MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) systems: the precoding matrix index (PMI)-based secret key generation with rotation matrix (MOPRO) and the channel quantization-based (MOCHA) scheme. The former utilizes PMI and rotated reference signals to prevent the eavesdroppers from learning the secret key information and the latter applies channel quantization in order to extract more secret key bits. It is shown that not only the secure communication but also the MIMO gain can be guaranteed by using the proposed schemes.
It has been a challenging task to estimate optical flow for videos in which either foreground or background exhibits remarkable motion information (i.e., large displacement), or those with insufficient resolution due to artifacts like motion blur or noise. We present a novel optical flow algorithm, which approaches the above problem as solving the task of energy minimization, which exploits image data and smoothness terms at the superpixel level. Our proposed method can be considered as an extended mean-shift algorithm, which advances color and gradient information of superpixels across consecutive frames with smoothness guarantees. Since we do not require assumptions of linearlization during optimization (as standard optical flow approaches do), we are able to alleviate local minimum problems and thus produce improved estimation results. Empirical results on the MPI-Sintel video dataset verify the effectiveness of our proposed method.