1,721,177 research outputs found
A Methodological Approach for Data-Intensive Web Application Design on Top of Data Lakes
Data exploration and decision making may benefit from the availability of data-intensive web applications, that enable domain experts to navigate across massive, dynamic and heterogeneous data sources, stored in the so-called Data Lakes. However, traditional design strategies for this kind of applications require in the background well-defined and cleaned data structures. Conceptual modelling may be fruitfully employed to provide web developers with a comprehensive vision over Data Lake sources, on which web applications are designed. Nevertheless, the cumbersome nature of Data Lakes turns the conceptual model into a dynamic entity, which must be properly managed. In this paper, we propose a methodological approach to design data-intensive web applications on top of a Data Lake. A conceptual data model, weaved over Data Lake sources, is leveraged to identify the relevant information to be included in the web application. The methodology makes the model evolve both with new data sources content emerging from the Data Lake, through a zone-based operations pipeline that prepares a curated version of the raw data (bottom-up), and with additional domain knowledge provided by web developers derived from the data-intensive web application design (top-down). The approach, independent from any specific implementation technology, is declined in the context of a real case study regarding an ongoing research project in the cultural heritage domain
PICTURE - A Framework to Assess the Degree of Approximation of Summarized Time Series
The analysis of time series data, which represents dynamic phenomena through sequences of observations, is greatly influenced by Big Data. Both the sheer volume and the advanced capabilities of Big Data significantly impact on how these analyses are conducted, enabling more comprehensive and detailed insights. Recent studies have promoted the use of data summarization techniques, for instance through incremental clustering, to address the challenges of Big Data volume. These techniques quickly capture data evolution, thereby helping domain experts make informed and proactive decisions by leveraging a concise representation of time series. However, although incremental clustering efficiently reduces data volume and retains key statistical information, it is important to evaluate the accuracy of the summarized version compared to the original time series data. This assessment is critical when the summarized data is used as the basis for complex analytical pipelines, such as those for pattern recognition and anomaly detection. Moved by these premises and starting from an empirical experience on the definition of a metric to assess the adherence of summarised time series to the original data stream, in this paper: (i) we propose a variant of a renowned quality metric for incremental clustering based on an abstract model of clustering data structures, to assess the extent to which the time series summary accurately captures the dynamics of the original data; (ii) we present PICTURE (Python-based Incremental Clustering for Time series Representation and Evaluation) a framework featuring four widely used incremental clustering algorithms from the literature, equipped with modules for execution, representation, and evaluation of clustering results applied to time series according to the abstract model; (iii) we conduct an extensive qualitative and quantitative analysis of incremental clustering results on a synthetic and two real-world datasets using the PICTURE framework, to demonstrate the effectiveness of the proposed metric in assessing the degree of approximation of summarised time series
IDEAaS: Interactive data exploration as-a service
Recently servitization has been proposed as a strategic business innovation to enrich products offerings with the delivery of remote services (e.g. remote monitoring services), thus also improving the perception of the product quality. The increasing connections of systems that produce high volumes of real time data have raised the need for advanced Data Exploration techniques able to face the impact of Big Data, in order to make remote monitoring services sustainable. In this paper, the IDEAaS (Interactive Data Exploration As-a-Service) approach is presented, apt to support and ease exploration of real time data in a dynamic context of interconnected systems, where large amounts of data must be incrementally collected, organised and analysed on-the-fly. The proposed approach relies on three main pillars: (i) a multi-dimensional organisation of data, for data exploration according to different analysis dimensions; (ii) data summarisation, based on incremental clustering algorithm, to provide summarised representation of collected data streams; (iii) data relevance evaluation techniques, to attract the users attention on relevant data only during exploration. Finally, the approach has been tested in a Smart Factory context, applying the interactive data exploration techniques in order to assist anomaly detection in remote monitoring services
Food Certification through Collaborative Sensory Analysis Methods and Tools
In the current global food market, there exists a vital need for data-driven tools that ensure the highest quality of food. Guaranteeing food quality demands meticulous control across the entire production chain, while adhering to best practices and legal regulations. However, beyond objective metrics for evaluating food quality, subjective elements derived from sensory analysis hold paramount importance. Sensory analysis involves assessing food through the five senses: taste, sight, touch, smell, and hearing. It significantly influences food choices and dietary preferences. The process of preparing a sensory analysis panel is complex and includes panel leaders, tasters and sensory analysis experts, who are in charge of analysing the panel results. Therefore, the process can greatly benefit from the use of specialised tools. These tools must facilitate all phases of the panel, from selecting tasters and food samples to analysing and visualising results, and must be properly integrated to maximise the outcome of the sensory analysis. They must also help in appropriately weighing tasters' input based on their experience and on-the-fly comparison against other tasters during the panel, ultimately culminating in the issuance of a food certification. To this aim, in this discussion paper we discuss a comprehensive suite of tools developed to manage sensory analysis panels. These tools are grounded on a shared conceptual data model and are specifically designed to evaluate food quality and generate a food certificate, ensuring that the highest standards are met throughout the food production and assessment process
An Empirical Approach for Clustering-Based Time Series Summarisation Assessment
In the last decades, the rise of Big Data solutions has significantly advanced the analysis of time series data as representation of dynamic phenomena through sequences of observations. Recent research efforts have advocated for the adoption of data summarisation techniques, such as incremental clustering, to promptly capture data evolution, thus facilitating domain experts in making informed and proactive decisions, capitalising on a compact representation of time series. Neverthe-less, while incremental clustering effectively reduces data volume, thus preserving relevant statistical information, it is crucial to estimate the degree of approximation between the original time series data and its summarised version. This evaluation is pivotal whenever the summarisation output is the starting point to set up complex analytical pipelines (e.g., for pattern recognition and anomaly detection purposes). Stemming from practical and empirical considerations made upon both a synthetic and a real-world dataset, we propose in this paper a variant of a renowned quality metric for incremental clustering, to assess the extent to which the time series summary accurately captures the dynamics of the original data
Site-specific effects of strength training on bone structure and geometry of ultradistal radius.
Knowledge of the effects of exercise on bone mass in postmenopausal women is limited and controversial. Animal studies have shown that the response of bone to bending strain is an alteration of bone geometry. We studied 250 postmenopausal women, aged 52-72 years, willing to participate in a 6-month exercise program. The first 125 started the program immediately and the remaining 125 served as controls. The training program included exercises designed to maximize the stress on the wrist. One hundred and eighteen of the active group and 116 of the control group completed the study and were reassessed 6 months later. Bone mineral density (BMD) of the femoral neck, lumbar spine, ultradistal and proximal radius was measured by dual-energy X-ray absorptiometry (DXA) both before and at the end of the exercise program. The forearm was also evaluated by peripheral quantitative computed tomography, which measures the area, bone mineral content (BMC), and volumetric density for both the cortical and the trabecular component. The results showed that the DXA measurements at the femoral neck, lumbar spine, ultradistal and proximal radius were similar between the two groups. No significant difference was detected after the exercise program at the proximal radius. At the ultradistal radius, the cross-sectional area of cortical bone rose by 2.8 +/- 15.0% (SD, p < 0.05), apparently for both periosteal apposition and corticalization of the trabecular tissue. The volumetric density of cortical bone rose by 2. 2 +/- 15.8% (p < 0.1), and that of trabecular bone decreased by 2.6 +/- 10.7% (p < 0.01). The combined changes in both bone volume and density in the exercise group were associated with marked increase in cortical BMC (3.1 +/- 10.7%, p < 0.01) and decrease in trabecular BMC (-3.4 +/- 14.2%, p < 0.05), which were statistically different from those observed in the control group (p < 0.05). In conclusion, these results confirm that site-specific moderate physical exercises have very little effect on bone mass. However, it appears that some exercises may reshape the bone segment under stress by increasing both the cross-sectional area and the density of the cortical component. These structural changes are theoretically associated with increases in the bending strength
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
