1,721,017 research outputs found
Neural discovery of balance-aware polarized communities
Signed graphs are a model to depict friendly (positive) or antagonistic (negative) interactions (edges) among users (nodes). 2-Polarized-Communities (2pc) is a well-established combinatorial-optimization problem whose goal is to find two polarized communities from a signed graph, i.e., two subsets of nodes (disjoint, but not necessarily covering the entire node set) which exhibit a high number of both intra-community positive edges and negative inter-community edges. The state of the art in 2pc suffers from the limitations that (i) existing methods rely on a single (optimal) solution to a continuous relaxation of the problem in order to produce the ultimate discrete solution via rounding, and (ii) 2pc objective function comes with no control on size balance among communities. In this paper, we provide advances to the 2pc problem by addressing both these limitations, with a twofold contribution. First, we devise a novel neural approach that allows for soundly and elegantly explore a variety of suboptimal solutions to the relaxed 2pc problem, so as to pick the one that leads to the best discrete solution after rounding. Second, we introduce a generalization of 2pc objective function – termed γ-polarity – which fosters size balance among communities, and we incorporate it into the proposed machine-learning framework. Extensive experiments attest high accuracy of our approach, its superiority over the state of the art, and capability of function γ-polarity to discover high-quality size-balanced communities
Document Clustering: The Next Frontier
The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in document collections arduous. Clustering has been long recognized as a useful tool for the task. It groups like-items together, maximizing intra-cluster similarity and inter-cluster distance. Clustering can provide insight into the make-up of a document collection and is often used as the initial step in data analysis. While most document clustering research to date has focused on moderate length single topic documents, real-life collections are often made up of very short or long documents. Short documents do not contain enough text to accurately compute similarities. Long documents often span multiple topics that general document similarity measures do not take into account. In this paper we will first give an overview of general purpose document clustering, and then focus on recent advancements in the next frontier in document clustering: long and short documents.Anastasiu, David C.; Tagarelli, Andrea; Karypis, George. (2013). Document Clustering: The Next Frontier. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215907
A Study of XML models for data mining : representations, methods, and issues
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining
Ensemble-based community detection in multilayer networks
The problem of community detection in a multilayer network can effectively be addressed by aggregating the community structures separately generated for each network layer, in order to infer a consensus solution for the input network. To this purpose, clustering ensemble methods developed in the data clustering field are naturally of great support. Bringing these methods into a community detection framework would in principle represent a powerful and versatile approach to reach more stable and reliable community structures. Surprisingly, research on consensus community detection is still in its infancy. In this paper, we propose a novel modularity-driven ensemble-based approach to multilayer community detection. A key aspect is that it finds consensus community structures that not only capture prototypical community memberships of nodes, but also preserve the multilayer topology information and optimize the edge connectivity in the consensus via modularity analysis. Empirical evidence obtained on seven real-world multilayer networks sheds light on the effectiveness and efficiency of our proposed modularity-driven ensemble-based approach, which has shown to outperform state-of-the-art multilayer methods in terms of modularity, silhouette of community memberships, and redundancy assessment criteria, and also in terms of execution times
Modeling evolutionary dynamics of lurking in social networks
Lurking is a complex user-behavioral phenomenon that occurs in all largescale online communities and social networks. It generally refers to the behavior characterizing users that benefit from the information produced by others in the community without actively contributing back to the production of social content. The amount and evolution of lurkers may strongly affect an online social environment, therefore understanding the lurking dynamics and identifying strategies to curb this trend are relevant problems. In this regard, we introduce the Lurking Game, i.e., a model for analyzing the transitions from a lurking to a non-lurking (i.e., active) user role, and vice versa, in terms of evolutionary game theory. We evaluate the proposed Lurking Game by arranging agents on complex networks and analyzing the system evolution, seeking relations between the network topology and the final equilibrium of the game. Results suggest that the Lurking Game is suitable to model the lurking dynamics, showing how the adoption of rewarding mechanisms combined with the modeling of hypothetical heterogeneity of users’ interestsmay lead users in an online community towards a cooperative behavior
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
