1,721,142 research outputs found

    OLAP and NoSQL: Happily Ever After

    Full text link
    NoSQL databases are preferred to relational ones for stor- ing heterogeneous data with variable schema and structure. However, their schemaless nature adds complexity to analytical applications, in which a single OLAP analysis often involves large sets of data with different schemas. In this tutorial we describe the main approaches to enable OLAP on NoSQL data. We start from schema-on-read approaches, where data are left unchanged in their structure until they are accessed by the user, so they are put into multidimensional form at query time. Specifically, we show how this enables a form of approximated OLAP that embraces the inherent variety of schemaless data. Then we move to schema-on-write approaches, where a fixed multidimensional structure is forced onto data, which are loaded into a data warehouse to be then queried. In particular, we introduce multi-model data warehouses as a way to store data in multidimensional form and, at the same time, let each piece of data be natively represented through the most appropriate NoSQL model

    A Model-Driven Approach to Automate Data Visualization in Big Data Analytics

    Full text link
    In big data analytics, advanced analytic techniques operate on big data sets aimed at complementing the role of traditional OLAP for decision making. To enable companies to take benefit of these techniques despite the lack of in-house technical skills, the H2020 TOREADOR Project adopts a model-driven architecture for streamlining analysis processes, from data preparation to their visualization. In this paper we propose a new approach named SkyViz focused on the visualization area, in particular on (i) how to specify the user's objectives and describe the dataset to be visualized, (ii) how to translate this specification into a platform-independent visualization type, and (iii) how to concretely implement this visualization type on the target execution platform. To support step (i) we define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and conceptually describing the data to be visualized. To automate step (ii) we propose a skyline-based technique that translates a visualization context into a set of most-suitable visualization types. Finally, to automate step (iii) we propose a skyline-based technique that, with reference to a specific platform, finds the best bindings between the columns of the dataset and the graphical coordinates used by the visualization type chosen by the user. SkyViz can be transparently extended to include more visualization types on the one hand, more visualization coordinates on the other. The paper is completed by an evaluation of SkyViz based on a case study excerpted from the pilot applications of the TOREADOR Project

    Approximate OLAP of Document-Oriented Databases: a Variety-Aware Approach

    Full text link
    Schemaless databases, and document-oriented databases in particular, are preferred to relational ones for storing heterogeneous data with variable schemas and structural forms. However, the absence of a unique schema adds complexity to analytical applications, in which a single analysis often involves large sets of data with different schemas. In this paper we propose an original approach to OLAP on collections stored in document-oriented databases. The basic idea is to stop fighting against schema variety and welcome it as an inherent source of information wealth in schemaless sources. Our approach builds on four stages: schema extraction, schema integration, FD enrichment, and querying; these stages are discussed in detail in the paper. To make users aware of the impact of schema variety, we propose a set of indicators inspired by the definition of attribute density. Finally, we experimentally evaluate our approach in terms of efficiency and effectiveness

    Using Regression to Explain Cube Measures

    Full text link
    The Intentional Analytics Model (IAM) has been devised to couple OLAP and analytics by (i) letting users express their analysis intentions on multidimensional data cubes and (ii) returning enhanced cubes, i.e., multidimensional data annotated with knowledge insights in the form of models (e.g., correlations). Five intention operators were proposed to this end; of these, \sf{describe} and \sf{assess} have been investigated in previous papers. In this work we enrich the IAM picture by focusing on the \sf{explain} operator, whose goal is to provide an answer to the user asking ``why does measure mm show these values?''. Specifically, we propose a syntax for the operator and discuss how enhanced cubes are built by (i) finding the polynomials that best approximate the relationship between mm and the other cube measures, and (ii) highlighting the most interesting one. Finally, we test the operator implementation in terms of efficiency

    Describing and Assessing Cubes Through Intentional Analytics

    Full text link
    The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP and analytics by (i) letting users explore multidimensional cubes stating their intentions, and (ii) returning multidimensional data coupled with knowledge insights in the form of annotations of subsets of data. Goal of this demonstration is to showcase the IAM approach using a notebook where the user can create a data exploration session by writing describe and assess statements, whose results are displayed by combining tabular data and charts so as to bring the highlights discovered to the user's attention. The demonstration plan will show the effectiveness of the IAM approach in supporting data exploration and analysis and its added value as compared to a traditional OLAP session by proposing two scenarios with guided interaction and letting users run custom sessions

    Variety-Aware OLAP of Document-Oriented Databases

    No full text
    Schemaless databases, and document-oriented databases in particular, are preferred to relational ones for storing heterogeneous data with variable schemas and structural forms. However, the absence of a unique schema adds complexity to analytical applications, in which a single analysis often involves large sets of data with different schemas. In this paper we propose an original approach to OLAP on collections stored in document-oriented databases. The basic idea is to stop fighting against schema variety and welcome it as an inherent source of information wealth in schemaless sources. Our approach builds on four stages: schema extraction, schema integration, FD enrichment, and querying; these stages are discussed in detail in the paper. To make users aware of the impact of schema variety, we propose a set of indicators related for instance to query completeness and precision

    Augmented Business Intelligence

    No full text
    Augmented reality allows users to superimpose digital information (typically, of operational type) upon real world entities. The synergy of analytical frameworks and augmented reality opens the door to a new wave of situated OLAP, in which users within a physical environment are provided with immersive analyses of local contextual data. In this paper we propose an approach that, based on the sensed augmented context (provided by wearable and smart devices), proposes a set of relevant analytical queries to the user. This is done by relying on a mapping between the entities that can be recognized by the devices and the elements of the enterprise data, and also taking into account the queries preferred by users during previous interactions that occurred in similar contexts. A set of experimental tests evaluates the proposed approach in terms of efficiency and effectiveness

    Schema profiling of document-oriented databases

    Full text link
    In document-oriented databases, schema is a soft concept and the documents in a collection can be stored using different local schemata. This gives designers and implementers augmented flexibility; however, it requires an extra effort to understand the rules that drove the use of alternative schemata when sets of documents with different ---and possibly conflicting--- schemata are to be analyzed or integrated. In this paper we propose a technique, called schema profiling, to explain the schema variants within a collection in document-oriented databases by capturing the hidden rules explaining the use of these variants. We express these rules in the form of a decision tree (schema profile). Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles. The algorithm we adopt to this end is inspired by the well-known C4.5 classification algorithm and builds on two original features: the coupling of value-based and schema-based conditions within schema profiles, and the introduction of a novel measure of entropy to assess the quality of a schema profile. A set of experimental tests made on both synthetic and real datasets demonstrates the effectiveness and efficiency of our approach

    An Active Learning Approach to Build Adaptive Cost Models for Web Services

    No full text
    Delivering accurate estimates of query costs in web services is important in different contexts, e.g., to measure their Quality of Service. However, building a reliable cost model is difficult as (i) a web service is a black box often hiding a complex computation, (ii) a call to the same service can yield completely different costs by simply changing a parameter value, and (iii) execution costs can drift with time. In this paper we propose Tiresias, an approach that, given a web service exposing an interface with a fixed number of parameters, initializes and actively adapts a model to accurately predict query costs. The cost model is represented by a regression tree trained through two interleaved querying cycles: a passive one, where the costs measured for user-generated queries are used to update the tree, and an active one, where the service is probed through system-generated queries to cope with drifts in the cost function. Tiresias is finally evaluated in terms of effectiveness and efficiency through a set of experimental tests performed on both real and synthetic datasets

    A-BI+: A Framework for Augmented Business Intelligence

    Full text link
    Augmented reality allows users to superimpose digital information (typically, of operational type) upon real-world objects. The synergy of analytical frameworks and augmented reality opens the door to a new wave of situated analytics, in which users within a physical environment are provided with immersive analyses of local contextual data. In this paper, we propose an approach named A-BI+ (Augmented Business Intelligence) that, based on the sensed augmented context (provided by wearable and smart devices), proposes a set of relevant analytical queries to the user. This is done by relying on a mapping between the objects that can be recognized by the devices and the elements of the enterprise multidimensional cubes, and also by taking into account the queries preferred by users during previous interactions that occurred in similar contexts. A set of experimental tests evaluates the proposed approach in terms of efficiency, effectiveness, and user satisfaction
    corecore