1,720,978 research outputs found

    Mining Heterogeneous Urban Data at Multiple Granularity Layers

    Full text link
    The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests

    Characterizing unpredictable patterns in Wireless Sensor Network data

    No full text
    Wireless Sensor Network (WSN) monitoring takes a primary role in many industrial and research processes. Huge amounts of WSN sensor readings are nowadays available and can be analyzed to discover fruitful knowledge. This paper focuses on analyzing historical WSN sensor readings to identify the combinations of sensors whose readings show an unexpected trend. Although significant variations of single sensor readings may be easily detected, discovering correlations between multiple sensor readings is challenging without using advanced data analytics tools. To tackle this issue, we present an itemset-based data mining approach to analyzing WSN data. It identifies the combinations of sensors (of arbitrary size) whose readings are unexpectedly low in a given time period. Since the readings acquired by multiple sensors may decrease in an alternate fashion, the discovered patterns provide new information compared to single sensor analysis. To make the mined patterns manageable by domain experts for manual inspection, the mining algorithm is driven by spatial constraints defined on the WSN topology. The experimental results, achieved on real WSN data, demonstrate the effectiveness of the proposed approach in detecting heating system malfunctioning

    Supporting the analysis of urban data through NOSQL technologies

    No full text
    In the last few years, the capability to both generate and collect data of public interest within the urban area has increased at an unprecedented rate, to such an extent that data rapidly scales towards big urban data. The abundance of information collected through ad-hoc sensors networks in the smart city context provides an unprecedented opportunity to tackle interesting urban challenges and adds intelligences in the urban environment. However, for each data source and type, different spatial and temporal references are potentially used. Hence, the complexity of dealing with such an heterogeneity of data has significantly increased. This paper proposes a distributed business intelligence engine, named BI2CITY, able to efficiently manage the process of collecting, integrating and analyzing a large volume of heterogeneous data generated by various sources in the smart city context. BI2CITY exploits a Big Data approach to support (i) data storage, (ii) spatio-temporal data aggregation, and (iii) different targeted analyses, such as correlating urban data and forecasting the expected values of some interesting data (e.g., air pollution). Spatio-temporal data aggregation and analyses are performed on the fly using MapReduce based algorithms. Experimental results on real data collected in a major Italian city demonstrate the effectiveness of the proposed distributed system to perform interesting and efficient analysi

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore