1,721,006 research outputs found

    Task-Guided Pair Embedding in Heterogeneous Network

    No full text
    Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being written by an author (paper-author pairwise relationship). Existing task-guided embedding methods are node-centric in that they simply measure the similarity between the node embeddings to compute the likelihood of a pairwise relationship between two nodes. However, we claim that for task-guided embeddings, it is crucial to focus on directly modeling the pairwise relationship. In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e.g., paper-author relationship in author identification). To this end, we 1) propose to learn a pair embedding under the guidance of its associated context path, i.e., a sequence of nodes between the pair, and 2) devise the pair validity classifier to distinguish whether the pair is valid with respect to the specific task at hand. By introducing pair embeddings that capture the semantics behind the pairwise relationships, we are able to learn the fine-grained pairwise relationship between two nodes, which is paramount for task-guided embedding methods. Extensive experiments on author identification task demonstrate that TaPEm outperforms the state-of-the-art methods, especially for authors with few publication records. © 2019 Association for Computing Machinery.1

    DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

    No full text
    With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.1

    Sentiment Classification With Convolutional Neural Network using Multiple Word Representations

    No full text
    Most neural network models for sentiment classification use word vectors pre-trained by word embedding methods to represent a word. Although word vectors are trained on large corpus, most of them are restricted by the vocabularies in the corpus. Since sentiment classification models have to capture subtle meaning of sentence, it is desirable to represent words that have not been pretrained by word embedding method. To achieve this goal, we propose a sentiment classification model with convolutional neural network using multipleword representations. We represent aword by three embedding methods including word2vec, GloVe, and our method which is based on a character level embedding method that successfully captures subtle differences between words. Experimental results from three datasets show that our model with an additional character level embedding method improves the accuracy of the sentiment classification. © 2018 ACM.1

    Task-Guided Pair Embedding in Heterogeneous Network

    No full text
    Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being written by an author (paper-author pairwise relationship). Existing taskguided embedding methods are node-centric in that they simply measure the similarity between the node embeddings to compute the likelihood of a pairwise relationship between two nodes. However, we claim that for task-guided embeddings, it is crucial to focus on directly modeling the pairwise relationship. In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e.g., paper-author relationship in author identification). To this end, we 1) propose to learn a pair embedding under the guidance of its associated context path, i.e., a sequence of nodes between the pair, and 2) devise the pair validity classifier to distinguish whether the pair is valid with respect to the specific task at hand. By introducing pair embeddings that capture the semantics behind the pairwise relationships, we are able to learn the fine-grained pairwise relationship between two nodes, which is paramount for task-guided embedding methods. Extensive experiments on author identification task demonstrate that TaPEm outperforms the state-of-the-art methods, especially for authors with few publication records

    BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network

    No full text
    The goal of network embedding is to transform nodes in a network to a low-dimensional embedding vectors. Recently, heterogeneous network has shown to be effective in representing diverse information in data. However, heterogeneous network embedding suffers from the imbalance issue, i.e. the size of relation types (or the number of edges in the network regarding the type) is imbalanced. In this paper, we devise a new heterogeneous network embedding method, called BHIN2vec, which considers the balance among all relation types in a network. We view the heterogeneous network embedding as simultaneously solving multiple tasks in which each task corresponds to each relation type in a network. After splitting the skip-gram loss into multiple losses corresponding to different tasks, we propose a novel random-walk strategy to focus on the tasks with high loss values by considering the relative training ratio. Unlike previous random walk strategies, our proposed random-walk strategy generates training samples according to the relative training ratio among different tasks, which results in a balanced training for the node embedding. Our extensive experiments on node classification and recommendation demonstrate the superiority of BHIN2vec compared to the state-of-the-art methods. Also, based on the relative training ratio, we analyze how much each relation type is represented in the embedding space. © 2019 Association for Computing Machinery.1

    DualSentiNet : Dual Prediction of Word and Document Sentiments Using Shared Word Embedding

    No full text
    With the popularization of social networking services, numerous words are newly emerging every day in personalized document sources. Slang terms, abbreviations, newly coined words, and nongrammatical words or expressions belong here, and people are more likely to use these words with a certain sentimental tendency compared to other standard words. Thus, it becomes important to nd their meanings or sentiments to analyze the sentiment of user-generated texts. This paper proposes a novel sentiment analysis model, termed DualSentiNet, which predicts the sentiments of newly emerged words and documents at the same time. Our model is composed of three parts: (i) a word-level sentiment regression network, (ii) a document-level sentiment classi cation network, and (iii) a shared word embedding layer. DualSentiNet makes a word embedding layer shared by two different networks, thereby learning richer information about both word-level and document-level sentiments through two-way back-propagation. Consequently, it improves the performance of sentiment prediction by preventing word vectors from being over tted. Experimental results show that DualSentiNet signi cantly outperforms competitors in terms of both document sentiment classi cation accuracy and the word sentiment regression RMSE. In addition, DualSentiNet produces better word embedding by reecting both word and document sentiments. © 2018 ACM.1
    corecore