N/A
This paper proposes an item concept embedding (ICE) framework to model item concepts via textual information. Specifically, in the proposed framework there are two stages: graph construction and embedding learning. In the first stage, we propose a generalized network construction method to build a network involving heterogeneous nodes and a mixture of both homogeneous and heterogeneous relations. The second stage leverages the concept of neighborhood proximity to learn the embeddings of both items and words. With the proposed carefully designed ICE networks, the resulting embedding facilitates both homogeneous and heterogeneous retrieval, including item-to-item and word-to-item retrieval. Moreover, as a distributed embedding approach, the proposed ICE approach not only generates related retrieval results but also delivers more diverse results than traditional keyword-matching-based approaches. As our experiments on two real-world datasets show, ICE encodes useful textual information and thus outperforms traditional methods in various item classification and retrieval tasks.
This paper addresses three issues in integrating partbased representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is labor-intensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation- and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.
Singing voice separation attempts to separate the vocal and instrumental parts of a music recording, which is a fundamental problem in music information retrieval. Recent work on singing voice separation has shown that the low-rank representation and informed separation approaches are both able to improve separation quality. However, low-rank optimizations are computationally inefficient due to the use of singular value decompositions. Therefore, in this paper, we propose a new lineartime algorithm called informed group-sparse representation, and use it to separate the vocals from music using pitch annotations as side information. Experimental results on the iKala dataset confirm the efficacy of our approach, suggesting that the music accompaniment follows a group-sparse structure given a pretrained instrumental dictionary. We also show how our work can be easily extended to accommodate multiple dictionaries using the DSD100 dataset.
N/A
N/A
This paper considers a device-to-device (D2D) communications underlaid multiple-input multiple-output (MIMO) cellular network and studies D2D mode selection from a previously unexamined perspective. Since D2D mode selection affects the network interference profile and vice versa, a joint D2D mode selection and interference management is desired but challenging. In this work, we propose a holistic approach to this problem with interference-free considerations. We adopt the degrees-of-freedom (DoF) as the mode-selection criterion and exploit the linear interference alignment (IA) technique for interference management. We analyze the achievable sum DoF of the potential D2D users according to their mode selections, and derive the probabilistic sum-rate relations between the proposed DoF-based mode selection scheme and the common received-signal-strength-index (RSSI)-based mode selection scheme in Poisson point process (PPP) networks. Simulation illustrates the theoretical insights and shows the advantages of the proposed DoF-based mode selection scheme over conventional mode selection schemes from various perspectives. The proposed scheme presents a promising proposal for D2D mode selection in 5G communications.
Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the pre- diction of more than one label category for each input in- stance. We propose a novel deep neural networks (DNN) based model, C anonical C orrelated A uto E ncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label out- puts. Our C2AE is achieved by integrating the DNN archi- tectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with miss- ing labels. Our experiments on multiple datasets with differ- ent scales confirm the effectiveness and robustness of our pro- posed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.
Unsupervised domain adaptation deals with scenarios in which labeled data are available in the source domain, but only unlabeled data can be observed in the target domain. Since the classifiers trained by source-domain data would not be expected to generalize well in the target domain, how to transfer the label information from source to target domain data is a challenging task. A common technique for unsupervised domain adaptation is to match cross-domain data distributions, so that the domain and distribution differences can be suppressed. In this paper, we propose to utilize the label information inferred from the source domain, while the structural information of the unlabeled target-domain data will be jointly exploited for adaptation purposes. Our proposed model not only reduces the distribution mismatch between domains, improved recognition of target-domain data can be achieved simultaneously. In the experiments, we will show that our approach performs favorably against the state-of-the-art unsupervised domain adaptation methods on benchmark data sets. We will also provide convergence, sensitivity, and robustness analysis, which support the use of our model for cross-domain classification.
Textural style transfer aims to transfer the textural style identified from a reference image to a source image, while retaining the scene of the source image. This article proposes a context-aware style transfer algorithm based on sparse-representation-based textural synthesis. Whereas sparse representation is designed to extract the style component of the exemplar image, textural synthesis is performed in a context-aware setting to preserve the original scene structure of the source image. Unlike existing solutions that require prior knowledge of the textural style of interest or user interaction, this method performs the transfer automatically. Experimental results demonstrate the effectiveness of the proposed method for automatic style transfer from a single style template image that is not accompanied with its original real image.