579 research outputs found

    Extreme-Scale Model-Based Time Series Management with ModelarDB (Invited Talk)

    Full text link
    To monitor critical industrial devices such as wind turbines, high quality sensors sampled at a high frequency are increasingly used. Current technology does not handle these extreme-scale time series well [Søren Kejser Jensen et al., 2017], so only simple aggregates are traditionally stored, removing outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing extreme-scale time series that approximates the time series values using mathematical functions (models) and stores only model coefficients rather than data values. Compression is done both for individual time series and for correlated groups of time series. The keynote will present concepts, techniques, and algorithms from model-based time series management and our implementation of these in the open source Time Series Management System (TSMS) ModelarDB[Søren Kejser Jensen et al., 2018; Søren Kejser Jensen et al., 2019; Søren Kejser Jensen et al., 2021] . Furthermore, it will present our experimental evaluation of ModelarDB on extreme-scale real-world time series, which shows that that compared to widely used Big Data formats, ModelarDB provides up to 14× faster ingestion due to high compression, 113× better compression due to its adaptability, 573× faster aggregatation by using models, and close to linear scale-out scalability. ModelarDB is being commercialized by the spin-out company ModelarData

    A foundation for spatio-textual-temporal cube analytics

    Full text link
    Large amounts of spatial, textual, and temporal (STT) data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing STT data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube (STTCube) structure to enable combined effective and efficient analytical queries over STT data. Our novel data model over STT objects enables novel joint and integrated STT insights that are hard to obtain using existing methods. Moreover, we introduce the new concept of STT measures with associated novel STTOLAP operators. To allow for efficient large-scale analytics, we present a pre-aggregation framework for exact and approximate computation of STT measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1-5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the STTCube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3-5 orders of magnitude compared to No Materialization.</p

    Example-Driven Exploratory Analytics over Knowledge Graphs

    Full text link
    Due to their expressive power, Knowledge Graphs (KGs) have received increasing interest not only as means to structure and integrate heterogeneous information but also as a native storage format for large amounts of knowledge and statistical data. Therefore, analytical queries over KG data, typically stored as RDF, have become increasingly important. Yet, formulating such queries represents a difficult task for users that are not familiar with the query language (typically SPARQL) and the structure of the dataset at hand. To overcome this limitation, we propose Re2xOLAP: The first comprehensive interactive approach that allows to reverse-engineer and refine RDF exploratory OLAP queries over KGs containing statistical data. Thus, Re2xOLAP enables to perform KG exploratory analytics without requiring the user to write any query at all.We achieve this goal by first reverseengineering analytical SPARQL queries from a small set of userprovided examples and then, given the reverse-engineered query, we propose intuitive and explainable exploratory query refinements to iteratively help the user obtain the desired information. Our experiments on real-world large-scale KGs show that Re2xOLAP can efficiently reverse-engineer analytical SPARQL queries solely based on a small set of input examples. Additionally, we demonstrate the expressive power of our interactive refinement methods by showing that Re2xOLAP allows users to navigate hundreds of thousands of different exploration paths with just a few interactions.</p

    Towards Exploratory OLAP over Linked Open Data:a Case Study

    No full text
    Business Intelligence (BI) tools provide fundamental support for analyzing large volumes of information. Data Warehouses (DW) and Online Analytical Processing (OLAP) tools are used to store and analyze data. Nowadays more and more information is available on the Web in the form of Resource Description Framework (RDF), and BI tools have a huge potential of achieving better results by integrating real-time data from web sources into the analysis process. In this paper, we describe a framework for so-called exploratory OLAP over RDF sources. We propose a system that uses a multidimensional schema of the OLAP cube expressed in RDF vocabularies. Based on this information the system is able to query data sources, extract and aggregate data, and build a cube. We also propose a computer-aided process for discovering previously unknown data sources and building a multidimensional schema of the cube. We present a use case to demonstrate the applicability of the approach.SCOPUS: cp.kinfo:eu-repo/semantics/publishe

    Efficient Temporal Pattern Mining in Big Time Series Using Mutual Information

    Full text link
    Very large time series are increasingly available from an ever wider range of IoT-enabled sensors deployed in different environments. Significant insights can be gained by mining temporal patterns from these time series. Unlike traditional pattern mining, temporal pattern mining (TPM) adds event time intervals into extracted patterns, making them more expressive at the expense of increased time and space complexities. Existing TPM methods either cannot scale to large datasets, or work only on pre-processed temporal events rather than on time series. This paper presents our Frequent Temporal Pattern Mining from Time Series (FTPMfTS) approach providing: (1) The end-to-end FTPMfTS process taking time series as input and producing frequent temporal patterns as output. (2) The efficient Hierarchical Temporal Pattern Graph Mining (HTPGM) algorithm that uses efficient data structures for fast support and confidence computation, and employs effective pruning techniques for significantly faster mining. (3) An approximate version of HTPGM that uses mutual information, a measure of data correlation, to prune unpromising time series from the search space. (4) An extensive experimental evaluation showing that HTPGM outperforms the baselines in runtime and memory consumption, and can scale to big datasets. The approximate HTPGM is up to two orders of magnitude faster and less memory consuming than the baselines, while retaining high accuracy

    A design space for RDF data representations

    Full text link
    RDF triplestores’ ability to store and query knowledge bases augmented with semantic annotations has attracted the attention of both research and industry. A multitude of systems offer varying data representation and indexing schemes. However, as recently shown for designing data structures, many design choices are biased by outdated considerations and may not result in the most efficient data representation for a given query workload. To overcome this limitation, we identify a novel three-dimensional design space. Within this design space, we map the trade-offs between different RDF data representations employed as part of an RDF triplestore and identify unexplored solutions. We complement the review with an empirical evaluation of ten standard SPARQL benchmarks to examine the prevalence of these access patterns in synthetic and real query workloads. We find some access patterns, to be both prevalent in the workloads and under-supported by existing triplestores. This shows the capabilities of our model to be used by RDF store designers to reason about different design choices and allow a (possibly artificially intelligent) designer to evaluate the fit between a given system design and a query workload.<br/

    Authors: Dennis Pedersen

    No full text
    The changing data requirements of today&apos;s dynamic business environments are not handled well by current On-Line Analytical Processing (OLAP) systems. Physically integrating unexpected data into such systems is a long and time-consuming process making logical integration, i.e., federation, the better choice in many situations. The increasing use of Extended Markup Language (XML), e.g. in business-to-business (B2B) applications, suggests that the required data will often be available as XML data. This means that logical federations of OLAP and XML databases will be very attractive in many cases. However, for such OLAP-XML federations to be useful, effective optimization techniques for such systems are needed
    corecore