1,721,135 research outputs found

    Detecting and generalizing quasi-identifiers by affecting singletons

    No full text
    In order to adhere to Open Government doctrine, Public Administrations (PAs) are requested to publish Open Data while preventing the disclosure of personal information of their citizens. Therefore, it is crucial for PAs to employ methods that ensure Privacy-preserving data publishing by distributing useful data while protecting individual privacy. In this paper, we study this problem by providing a two phases approach. First, we detect privacy issues by recognizing the minimum number of attributes that expose the highest number of unique values (that will be referred to as singletons) as Quasi-Identifier. We test our approach on real datasets openly published by the Italian government, and we discover that the quasi-identifier (year_of_birth, sex, ZIP_ofresidence) discloses up to 2% unique values in already anonymized datasets. Once accomplished the detection phase, we propose an anonymization approach to limit the privacy leakage. We investigate which combination of attributes must be generalized to achieve the minimum number of singletons while minimising the amount of modified and removed rows. We tested our approach on real datasets as in the previous phase, and we noticed that by generalizing only rows corresponding to the singletons, we achieve nearly no singletons while affecting only the 2% of rows

    Fractal Compression Approach for Efficient Interactive Terrain Rendering on the GPU

    Full text link
    This paper describes an efficient technique for the rendering of large terrain surfaces. The technique is based on a simple rings structure: a sequence of concentric rings at different resolutions and centeredon the viewer's position. Each ring is represented by a set of patches at identical resolutions. Rings near the viewer have a finer resolution than the rings further from the viewer. At runtime, the patches within the rings change resolution based on the viewer's position. The GPU decodes in real time height maps encoded by a fractal compressor from which sample the height component of the terrain. Since adjacent patches of different rings can disagree on the resolution of common edge GPU stitches the meshes in order to avoid any cracks or degenerate triangles. The renderedmeshes ensure the absence of cracks that may cause the appearance of visual artifacts. In addition, a tile manager support is evaluated in order to maintain terrain datasets on disk storage avoiding a costly load of the entire datasets into the memory

    Characterizing the behavioral evolution of twitter users and the truth behind the 90-9-1 rule

    No full text
    Online Social Networks (OSNs) represent a fertile field to collect real user data and to explore OSNs user behavior. Recently, two topics are drawing the attention of researchers: the evolution of online social roles and the question of participation inequality. In this work, we bring these two fields together to study and characterize the behavioral evolution of OSNs users according to the quantity and the typology of their social interactions. We found that online participation on the microblogging platform can be categorized into four different activity levels. Furthermore, we empirically verified that the 90-9-1 rule of thumb about participation inequality is not an accurate representation of reality. Findings from our analysis reveal that lurkers are less than expected: they are not 9 out of 10 as suggested by Nielsen, but 3 out of 4. This represents a significant result that can give new insights on how users relate with social media and how their use is evolving towards a more active interaction with the new generation of consumers

    Detecting Data Accuracy Issues in Textual Geographical Data by a Clustering-based Approach

    No full text
    Data are published to encourage data exploitation. However, data quality issues threaten data consumption and require data consumers investing time and effort in data cleansing. By focusing on textual geographical data, we aim to detect inaccurate values, such as typos, truncated values, and propose corrections by a clustering-based approach. Our method is mainly based on a dictionary of correct values, the Agglomerative clustering to group data in clusters, and Levenshtein and Fuzzy string searching for computing word similarity. We test our approach on real open datasets published by the Campania region, heterogeneous in the topic, size, and type of errors by showing the positive results of using Levenshtein and Fuzzy Matching and exploiting clustering methods in detecting and correcting quality issues in textual geographical data. The achieved results are useful for data producers and consumers, both for the academy and the industry, in any application domain

    Move cultural heritage knowledge graphs in everyone's pocket

    No full text
    Last years witnessed a shift from the potential utility in digitisation to a crucial need to enjoy activities virtually. In fact, before 2019, data curators recognised the utility of performing data digitisation, while during the lockdown caused by the COVID-19, investing in virtual and remote activities to make culture survive became crucial as no one could enjoy Cultural Heritage in person. The Cultural Heritage community heavily invested in digitisation campaigns, mainly modelling data as Knowledge Graphs by becoming one of the most successful Semantic Web technologies application domains.Despite the vast investment in Cultural Heritage Knowledge Graphs, the syntactic complexity of RDF query languages, e.g., SPARQL, negatively affects and threatens data exploitation, risking leaving this enormous potential untapped. Thus, we aim to support the Cultural Heritage community (and everyone interested in Cultural Heritage) in querying Knowledge Graphs without requiring technical competencies in Semantic Web technologies.We propose an engaging exploitation tool accessible to all without losing sight of developers' technological challenges. Engagement is achieved by letting the Cultural Heritage community leave the passive position of the visitor and actively create their Virtual Assistant extensions to exploit proprietary or public Knowledge Graphs in question-answering. By accessible to all, we mean that the proposed software framework is freely available on GitHub and Zenodo with an open-source license. We do not lose sight of developers' technical challenges, which are carefully considered in the design and evaluation phases.This article first analyses the effort invested in publishing Cultural Heritage Knowledge Graphs to quantify data developers can rely on in designing and implementing data exploitation tools in this domain. Moreover, we point out challenges developers may face in exploiting them in automatic approaches. Second, it presents a domain-agnostic Knowledge Graph exploitation approach based on virtual assistants as they naturally enable question-answering features where users formulate questions in natural language directly by their smartphones. Then, we discuss the design and implementation of this approach within an automatic community-shared software framework (a.k.a. generator) of virtual assistant extensions and its evaluation in terms of performance and perceived utility according to end-users. Finally, according to a taxonomy of the Cultural Heritage field, we present a use case for each category to show the applicability of the proposed approach in the Cultural Heritage domain. In overviewing our analysis and the proposed approach, we point out challenges that a developer may face in designing virtual assistant extensions to query Knowledge Graphs, and we show the effect of these challenges in practice

    Syntactical heuristics for the open data quality assessment and their applications

    No full text
    Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens. This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Toward a domain-specific language for scientific workflow-based applications on multicloud system

    No full text
    The cloud computing paradigm has emerged as the backbone of modern price-aware scalable computing systems. Many cloud service models are competing to become the leading doorway to access the computational power of cloud providers. Recently, a novel service model, called function-as-a-service (FaaS), has been proposed, which enables users to exploit the cloud computational scalability, left out the configuration and management of huge computing infrastructures. This article discloses Fly, a domain-specific language, which aims at reconciling cloud and high-performance computing paradigms adopting a multicloud strategy by providing a powerful, effective, and pricing-efficient tool for developing scalable workflow-based scientific applications by exploiting different and at the same time FaaS cloud providers as computational backends in a transparent fashion. We present several improvements of the Fly language, as well as a new enhanced version of a source-to-source compiler, which currently supports Symmetric Multiprocessing, Amazon AWS, and Microsoft Azure backends and translation of functions in Java, JavaScript, and Python programming languages. Furthermore, we discuss a performance evaluation of Fly on a popular benchmark for distributed computing frameworks, along with a collection of case studies with an analysis of their performance results and costs
    corecore