1,721,042 research outputs found

    An algorithm for generating XML Schemas from ER Schemas

    No full text
    The paper contains an algorithm to infer the XML schema from the ER one

    IQ4EC: Intensional answers as a support to exploratory computing

    No full text
    The advent of the Big Data challenge has stimulated research on methods and techniques to deal with the problem of managing data abundance. As a result, effective sense-making of semantically rich and big datasets has received a lot of attention, and new search approaches, such as Exploratory Computing (EC), have seen the light. In this paper we present IQ4EC, a system for data exploration inspired by EC, that supports users in the inspection of huge amounts of relational data through a step-by-step process, providing feedback based on approximate, intensional information expressed in terms of association rules. At each step of the process, the users can choose a portion of data to examine, and the system guides them to the next step by providing synthetic information and visualization of the resulting dataset

    A Graph-Based Data Model to Represent Transaction Time in Semistructured Data

    No full text
    In this paper we propose the Graphical sEmistructured teMporal data model (GEM), which is based on labeled graphs and allows one to represent in a uniform way semistructured data and their temporal aspects. In particular, we focus on transaction time

    Promoting data provenance tracking in the archaeological interpretation process

    Full text link
    n this paper we propose a model and a set of derivation rules for tracking data provenance during the archaeological interpretation process. The interpretation process is the main task performed by an archaeologist that, starting from ground data about evidences and findings, tries to derive knowledge about an ancient object or event. In particular, in this work we concentrate on the dating process used by archaeologists to assign one or more time intervals to a finding in order to define its lifespan on the temporal axis and we propose a framework to represent such information and infer new knowledge including provenance of data. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different finding

    Board-level functional fault diagnosis using data mining

    No full text
    This paper presents an approach for performing functional diagnosis of complex systems by means of data mining. The technique allows to derive a set of rules from a functional model of the system for efficiently driving the diagnosis procedure towards the identification of the most promising faulty candidate. The approach is adopted within an incremental method, to limit the number of tests to be performed, thus reducing costs and effort

    Mining rare association rules by discovering quasi-functional dependencies: an incremental approach

    No full text
    In the context of anomaly detection, the data mining technique of extracting association rules can be used to identify rare rules which represent infrequent situations. A method to detect rare rules is to first infer the normal behavior of objects in the form of quasi-functional dependencies (i.e. functional dependencies that frequently hold), and then analyzing rare violations with respect to them. The quasi-functional dependencies are usually inferred from the current instance of a database. However, in several applications, the database is not static, but new data are added or deleted continuously. Thus, the anomalies have to be updated because they change over time. In this chapter, we propose an incremental algorithm to efficiently maintain up-to-date rules (i.e., functional and quasi-functional dependencies). The impact of the cardinality of the data set and the number of new tuples on the execution time is evaluated through a set of experiments on synthetic and real databases, whose results are here reporte

    A context-based approach for partitioning big data

    Full text link
    In recent years, the amount of available data keeps growing at fast rate, and it is therefore crucial to be able to process them in an efficient way. The level of parallelism in tools such as Hadoop or Spark is determined, among other things, by the partitioning applied to the dataset. A common method is to split the data into chunks considering the number of bytes. While this approach may work well for text-based batch processing, there are a number of cases where the dataset contains structured information, such as the time or the spatial coordinates, and one may be interested in exploiting such a structure to improve the partitioning. This could have an impact on the processing time and increase the overall resource usage efficiency. This paper explores an approach based on the notion of context, such as temporal or spatial information, for partitioning the data. We design a context-based multi-dimensional partitioning technique that divides an n−dimensional space into splits by considering the distribution of the each contextual dimension in the dataset. We tested our approach on a dataset from a touristic scenario, and our experiments show that we are able to improve the efficiency of the resource usage

    FARGO: A Fair, Context-AwaRe, Group RecOmmender System

    Full text link
    Lots of activities, like watching a movie or going to the restaurant, are intrinsically group-based. To recommend such activities to groups, traditional single-user recommendation techniques cannot be adopted, as a consequence, over the years, a number of group recommender systems have been developed. Recommending to groups items to be enjoyed together poses many ethical challenges, in fact, a system whose unique objective is to achieve the best recommendation accuracy possible, might learn to disadvantage submissive users in favor of more aggressive ones. In this work we investigate the ethical challenges of context-aware group recommendations, in the more general case of ephemeral groups (i.e., groups where the members might be together for the first time), using a method that can recommend also items that are new in the system. We show the goodness of our method on two real-world datasets. The first one is a very large dataset containing the personal and group choices regarding TV programs of 7,921 users w.r.t. sixteen contexts of viewing. The second one, which has been collected specifically for this work and that is made publicly available as one of the contributions of this article, gathers the musical preferences (both individual and in groups) of 280 real users w.r.t. two contexts of listening. We compare the results of our approach with seven other group recommender systems specifically developed to be fair. We evaluate the goodness of our recommendations using recall, while their fairness is assessed using two measures found in the literature, namely, score disparity and recommendation disparity. Our extensive experiments show that our method always manages to obtain the highest recall while delivering ethical guarantees in line with the other fair group recommender systems tested

    An Expert CAD Flow for Incremental Functional Diagnosis of Complex Electronic Boards

    No full text
    Functional diagnosis for complex systems can be a very time-consuming and expensive task, trying to identify the source of an observed misbehavior. We propose an automatic incremental diagnostic methodology and CAD flow, based on data mining. It is a model-based approach that incrementally determines the tests to be executed to isolate the faulty component, aiming at minimizing the total number of executed tests, without compromising 100% diagnostic accuracy. The data mining engine allows for shorter test sequences with respect to other reasoning- based solutions (e.g., Bayesian belief networks), not requiring complex pre- and post-conditions management. Experimental results on a large set of synthetic examples and on three industrial boards substantiate the quality of the proposed approach
    corecore