1,720,968 research outputs found

    A transactional approach to associative XML Classification by Content and Structure

    No full text
    We propose XCCS, which is short for XML Classification by Content and Structure, a new approach for the induction of intelligible classification models for XML data, that are a valuable support for more effective and efficient XML search, retrieval and filtering. The idea behind XCCS is to represent each XML document as a transaction in a space of boolean features, that are informative of its content and structure. Suitable algorithms are developed to learn associative classifiers from the transactional representation of the XML data. XCCS induces very compact classifiers with outperforming effectiveness compared to several established competitors

    Learning Effective XML Classifiers Based on Discriminatory Structures and Nested Content

    No full text
    Supervised classification aims to learn a model (or a classifier) from a collection of XML documents individually marked with one of a predefined set of class labels. The learnt classifier isolates each class by the content and structural regularities observed within the respective labeled XML documents and, thus, allows to predict the unknown class of unlabeled XML documents by looking at their content and structural features. The classification of unlabeled XML documents into the predefined classes is a valuable support for more effective and efficient XML search, retrieval and filtering. We discuss an approach for learning intelligible XML classifiers. XML documents are represented as transactions in a space of boolean features, that are informative of their content and structure. Learning algorithms induce compact associative classifiers with outperforming effectiveness from the transactional XML representation. A preprocessing step contributes to the scalability of the approach with the size of XML corpora. © Springer-Verlag Berlin Heidelberg 2013

    Effective XML classification using content and structural information via rule learning

    No full text
    We propose a new approach to XML classification, that uses a particular rule-learning technique for the induction of interpretable classification models. These separate the individual classes of XML documents by looking at the presence within the XML documents themselves of certain features, that provide information on their content and structure. The devised approach induces classifiers with outperforming effectiveness in comparison to several established competitors. © 2011 IEEE

    X-Class: Associative classification of XML documents by structure

    No full text
    The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may not accord with the structural properties of the XML documents. Therein, an unexplored question is how to choose the type of structural regularity that best adapts to the structures of the available XML documents. We tackle this problem through X-Class, an approach that handles all types of tree-like substructures and allows for choosing the most discriminatory one. Algorithms are designed to learn compact rule-based classifiers in which the chosen substructures discriminate the classes of XML documents. X-Class is studied across various domains and types of substructures. Its classification performance is compared against several rule-based and SVM-based competitors. Empirical evidence reveals that the classifiers induced by X-Class are compact, scalable, and at least as effective as the established competitors. In particular, certain substructures allow the induction of very compact classifiers that generally outperform the rule-based competitors in terms of effectiveness over all chosen corpora of XML data. Furthermore, such classifiers are substantially as effective as the SVM-based competitor, with the additional advantage of a high-degree of interpretability. © 2013 ACM 1046-8188/2013/01-ART2 s15.00

    From global to local and viceversa: Uses of associative rule learning for classification in imprecise environments

    No full text
    We propose two models for improving the performance of rule-based classification under unbalanced and highly imprecise domains. Both models are probabilistic frameworks aimed to boost the performance of basic rule-based classifiers. The first model implements a global-to-local scheme, where the response of a global rule-based classifier is refined by performing a probabilistic analysis of the coverage of its rules. In particular, the coverage of the individual rules is used to learn local probabilistic models, which ultimately refine the predictions from the corresponding rules of the global classifier. The second model implements a dual local-to-global strategy, in which single classification rules are combined within an exponential probabilistic model in order to boost the overall performance as a side effect of mutual influence. Several variants of the basic ideas are studied, and their performances are thoroughly evaluated and compared with state-of-the-art algorithms on standard benchmark datasets. © 2011 Springer-Verlag London Limited

    Balancing prediction and recommendation accuracy: Hierarchical latent factors for preference data

    No full text
    Recent works in Recommender Systems (RS) have in- vestigated the relationships between the prediction ac- curacy, i.e. the ability of a RS to minimize a cost func- Tion (for instance the RMSE measure) in estimating users' preferences, and the accuracy of the recommenda- Tion list provided to users. State-of-the-art recommen- dation algorithms, which focus on the minimization of RMSE, have shown to achieve weak results from the rec- ommendation accuracy perspective, and vice versa. In this work we present a novel Bayesian probabilistic hi- erarchical approach for users' preference data, which is designed to overcome the limitation of current method- ologies and thus to meet both prediction and recommen- dation accuracy. According to the generative semantics of this technique, each user is modeled as a random mix- Ture over latent factors, which identify users community interests. Each individual user community is then mod- eled as a mixture of topics, which capture the prefer- ences of the members on a set of items. We provide two different formalization of the basic hierarchical model: BH-Forced focuses on rating prediction, while BH-Free models both the popularity of items and the distribu- Tion over item ratings. The combined modeling of item popularity and rating provides a powerful framework for the generation of highly accurate recommendations. An extensive evaluation over two popular benchmark datasets reveals the effectiveness and the quality of the proposed algorithms, showing that BH-Free realizes the most satisfactory compromise between prediction and recommendation accuracy with respect to several state- of-the-art competitors. Copyright © 2012 by the Society for Industrial and Applied Mathematics

    Fast and effective hierarchical clustering of XML documents by structure

    No full text
    A new parameter-free approach to clustering XML documents by structure is proposed. The idea is to consider various forms of structural patterns occurring in the XML documents to form a hierarchy of nested clusters. At any level in the hierarchy, clusters explain how the XML documents can be grouped on the basis of common structural patterns of the form considered at that level. The resulting explanation is progressively refined at the subsequent level, where another type of structural patterns is used to divide the individual clusters from the above level into subgroups, revealing meaningful and previously uncaught structural differences. Each cluster in the hierarchy is summarized through a novel technique into a corresponding representative, that provides a clear and differentiated understanding of the structural information within the cluster. A comparative evaluation conducted over both real-world and synthetic XML data proves the quality of the results of the devised approach in terms of effectiveness and cluster summarization

    A hierarchical rule-based framework for accurate classification in imprecise domains

    No full text
    A new hierarchical framework is preliminarily proposed for accurate classification in imprecise multi-class domains inherently characterized by rarity and noise. The key idea behind the devised framework is coupling the individual rules of an associative classifier with as many local probabilistic generative models. These are trained over the coverage of the associated rules, wherein it is likely that some globally rare cases/classes become less rare. The individual local probabilistic generative models are then employed into the classification process for accurately dealing with the corresponding forms of rarity. Two novel schemes for a tight integration between associative and probabilistic classification are introduced, wherein the class of an unlabeled case is decided by considering multiple class association rules as well as their relative score produced by the probabilistic classifier. An intensive evaluation shows that the proposed framework is in most cases superior in performance w.r.t. an established rule-based competitor

    Enforcing interaction and cooperation in content-based web3.0 applications

    No full text
    In this paper we complement our previous contribution [2] where we introduced principles and functional architecture of Borè, a novel framework that truly realizes Web3.0 principles and tools for the next-generation Internet. In particular, in this paper, we provide a more comprehensive overview of Borè by detailing peculiar aspects, such as the query languages and the related operators that allow us to browse and query the graph-like structure underlying a typical Borè's view, and providing a complete case study on Borè in action that focuses on an application domain related to the University context. © 2012 Springer-Verlag Berlin Heidelberg

    Rule learning with probabilistic smoothing

    No full text
    A hierarchical classification framework is proposed for discriminating rare classes in imprecise domains, characterized by rarity (of both classes and cases), noise and low class separability. The devised framework couples the rules of a rule-based classifier with as many local probabilistic generative models. These are trained over the coverage of the corresponding rules to better catch those globally rare cases/classes that become less rare in the coverage. Two novel schemes for tightly integrating rule-based and probabilistic classification are introduced, that classify unlabeled cases by considering multiple classifier rules as well as their local probabilistic counterparts. An intensive evaluation shows that the proposed framework is competitive and often superior in accuracy w.r.t. established competitors, while overcoming them in dealing with rare classes. © 2009 Springer Berlin Heidelberg
    corecore