In real-world video surveillance applications, one often needs to recognize face images from a very long distance. Such recognition tasks are very challenging, since such images are typically with very low resolution (VLR). However, if one simply downsamples high-resolution (HR) training images for recognizing the VLR test inputs, or if one directly upsamples the VLR inputs for matching the HR training data, the resulting recognition performance would not be satisfactory. In this paper, we propose a joint face hallucination and recognition approach based on sparse representation. Given a VLR input image, our method is able to synthesize its person-specific HR version with recognition guarantees. In our experiments, we consider two different face image datasets. Empirical results will support the use of our approach for both VLR face recognition. In addition, compared to state-of-the-art super-resolution (SR) methods, we will also show that our method results in improved quality for the recovered HR face images.
Computational modeling of musical timbre is important for a variety of music information retrieval applications. While considerable progress has been made to recognize musical genres and instruments, relatively little attention has been paid to modeling playing techniques, which affect timbre in more subtle ways. In this paper, we contribute to this area of research by systematically evaluating various audio features and processing methods for multi-class playing technique classification, considering up to nine distinct playing techniques of bowed string instruments. Specifically, a collection of 6,759 chamber-recorded single notes of four bowed string instruments and a collection of 33 real-world solo violin recordings are used in the evaluation. Our evaluation shows that using sparse features extracted from the magnitude spectra and phase derivatives including group delay function (GDF) and instantaneous frequency deviation (IFD) leads to significantly better performance than using a combination of state-of-the-art temporal, spectral, cepstral and harmonic feature descriptors. For playing technique classification of violin singe notes, the former approach attains 0.915 macro-average F-score under a tenfold cross validation setting, while the latter only attains 0.835. Moreover, sparse modeling of magnitude and phase-derived spectra also performs well for single-note joint instrument-technique classification (F-score 0.770) and for playing technique classification of real-world violin solos (F-score 0.547). We find that phase information is particularly important in discriminating playing techniques with subtle differences, such as playing with different bowing positions (i.e., normal, sul tasto, and sul ponticello). A systematic investigation of the effect of parameters such as window sizes, hop factors, window types for phase-derived features is also reported to provide more insights.
We present a clustering approach, MK-SOM, that carries out cluster-dependent feature selection, and partitions images with multiple feature representations into clusters. This work is motivated by the observations that human visual systems (HVS) can receive various kinds of visual cues for interpreting the world. Images identified by HVS as the same category are typically coherent to each other in certain crucial visual cues, but the crucial cues vary from category to category. To account for this observation and bridge the semantic gap, the proposed MK-SOM integrates multiple kernel learning (MKL) into the training process of self-organizing map (SOM), and associates each cluster with a learnable, ensemble kernel. Hence, it can leverage information captured by various image descriptors, and discoveries the cluster-specific characteristics via learning the per-cluster ensemble kernels. Through the optimization iterations, cluster structures are gradually revealed via the features specified by the learned ensemble kernels, while the quality of these ensemble kernels is progressively improved owing to the coherent clusters by enforcing SOM. Besides, MK-SOM allows the introduction of side information to improve performance, and it hence provides a new perspective of applying MKL to address both unsupervised and semisupervised clustering tasks. Our approach is comprehensively evaluated in the two applications. The superior and promising results manifest its effectiveness.
We propose novel photography recomposition method, which aims at transferring the photography composition of a reference image to an input image automatically. Without any user interaction, our approach first identies the salient foreground objects or image regions of interest, and the recomposition is performed by solving a graph-matching based optimization task. With additional post-processing step to preserve the locality and boundary information of the recomposed visual components, we can solve the task of photography recomposition without the uses of any prior knowledge on photography or predetermined image aesthetics rules. Experiments on a variety of images, including transferring the photography composition from real photos, sketches or even paintings, would conrm the eectiveness of our proposed method.
We propose a novel discriminative clustering algorithm with a hierarchical framework for solving unsupervised image segmentation problems. Our discriminative clustering process can be viewed as an EM algorithm, which alternates between the learning of image visual appearance models and the updates of cluster labels (i.e., segmentation outputs) for each image segment. In particular, we advance a simple-to-complex strategy during the above process, which allows the learning of a series of classifiers with different generalization capabilities from the input image, so that consecutive image segments can be well separated. With the proposed hierarchical framework, improved image segmentation can be achieved even if the shapes of the segments are complex, or the boundaries between them are ambiguous. Our work is different from existing region or contour-based approaches, which typically focus on either separating local image regions or determining the associated contours. Our experiments verify that we outperform state-of-the-art approaches on unsupervised image segmentation.
Abstract—In this paper, we present a new basis of polynomial over finite fields of characteristic two and then apply it to the encoding/decoding of Reed-Solomon erasure codes. The proposed polynomial basis allows that h-point polynomial evaluation can be computed in O(h log2(h)) finite field operations with small leading constant. As compared with the canonical polynomial basis, the proposed basis improves the arithmetic complexity of addition, multiplication, and the determination of polynomial degree from O(h log2(h) log2 log2(h)) to O(h log2(h)). Based on this basis, we then develop the encoding and erasure decoding algorithms for the (n = 2r; k) Reed-Solomon codes. Thanks to the efficiency of transform based on the polynomial basis, the encoding can be completed in O(n log2(k)) finite field operations, and the erasure decoding in O(n log2(n)) finite field operations. To the best of our knowledge, this is the first approach supporting Reed-Solomon erasure codes over characteristic-2 finite fields while achieving a complexity of O(n log2(n)), in both additive and multiplicative complexities. As the complexity leading factor is small, the algorithms are advantageous in practical applications.
This paper considers a noncoherent distributed space-frequency coded (SFC) wireless relay system with multiple relays. Each relay adopts a censoring scheme to determine whether the relay will decode and forward the source's information towards the destination. We analytically obtain the achievable diversity for both cases of perfect and imperfect relay censoring. With perfect censoring, we show that the same diversity of a conventional noncoherent SFC MIMO-OFDM system is achievable in the considered noncoherent distributed SFC system with maximum likelihood (ML) decoding, regardless of whether partial information of channel statistics and relay decoding status is available at the destination. With imperfect censoring, we analytically investigate how censoring errors affect the achievability of the system's diversity. We show that the two types of censoring errors, which correspond to useless and harmful relays, respectively, can decrease the achievable diversity significantly. Our analytical insights and numerical simulations demonstrate that the noncoherent distributed system can offer a comparable diversity as the conventional MIMO-OFDM system if relay censoring is carefully implemented.
N/A
In recent years, advances in virtualization technology have enabled multiple virtual machines to run on a physical machine, such that each virtual machine can perform independently with its own operating system. The IT industry has adopted virtualization technology because of its ability to improve hardware resource utilization, achieve low-power consumption, support concurrent applications, simplify device management, and reduce maintenance costs. However, because of the hardware limitation of storage devices, the I/O capacity could cause performance bottlenecks. To address the problem, we propose a hybrid storage access framework that exploits solid-state drives (SSDs) to improve the I/O performance in a virtualization environment.
Location-based services allow users to perform check-in actions, which not only record their geo-spatial activities, but also provide a plentiful source for data scientists to analyze and plan a more accurate and useful geographical recommender system. In this paper, we present a novel Time-aware Route Planning (TRP) problem using location check-in data. The central idea is that the pleasure of staying at the locations along a route is significantly affected by their visiting time. Each location has its own proper visiting time due to the category, objective, and population. To consider the visiting time of locations into route planning, we develop a three-stage time-aware route planning framework. First, since there is usually either noise time on existing locations or no visiting information on new locations constructed, we devise an inference method, LocTimeInf , to predict and recover the location visiting time on routes. Second, we aim to find the representative and popular time-aware location-transition behaviors from user check-in data, and a Time-aware Transit Pattern Mining (TTPM) algorithm is proposed. Third, based on the mined time-aware transit patterns, we develop a Proper Route Search (PR-Search) algorithm to construct the final time-aware routes for recommendation. Experiments on Gowalla check-in data exhibit the promising effectiveness and efficiency of the proposed methods, comparing to a series of competitors.