1,720,972 research outputs found

    Adaptive Quick Reduct for Feature Drift Detection

    Full text link
    Data streams are ubiquitous and related to the proliferation of low-cost mobile devices, sensors, wireless networks and the Internet of Things. While it is well known that complex phenomena are not stationary and exhibit a concept drift when observed for a sufficiently long time, relatively few studies have addressed the related problem of feature drift. In this paper, a variation of the QuickReduct algorithm suitable to process data streams is proposed and tested: it builds an evolving reduct that dynamically selects the relevant features in the stream, removing the redundant ones and adding the newly relevant ones as soon as they become such. Tests on five publicly available datasets with an artificially injected drift have confirmed the effectiveness of the proposed method

    Graded Possibilistic Meta Clustering

    No full text
    Meta clustering starts from different clusterings of the same data and aims to group them, reducing the complexity of the choice of the best partitioning and the number of alternatives to compare. Starting from a collection of single feature clusterings, a graded possibilistic medoid meta clustering algorithm is proposed in this paper, exploiting the soft transition from probabilistic to possibilistic memberships in a way that produces more compact and separated clusters with respect to other medoid-based algorithms. The performance of the algorithm has been evaluated on six publicly available data sets over three medoid-based competitors, yielding promising results

    Rough Graded Possibilistic Meta

    No full text
    Cluster analysis and outlier detection are strongly coupled tasks in data mining. A few points not belonging to any clusters can easily corrupt an otherwise well defined clustering structure. The same problem can be found in meta-clustering, where different clusterings of the same data are clustered to reduce the complexity of the choice of the best partitioning and the number of alternatives to compare. In this paper, the outlier rejection problem is tackled with a rough graded possibilistic medoid meta-clustering algorithm, exploiting its ability to perform a soft transition from probabilistic to possibilistic memberships and its natural rejection of anomalous observations. Outlier detection is hence based on a threshold, where a low memberships of a partition in all meta-clusters identifies observations to be filtered out from the clustering process. The effectiveness of the proposed approach has been assessed by comparing the performance of the meta clustering algorithm with and without clustering outlier detection on synthetic data, yielding promising results

    Fuzzy Cognitive Maps Extraction from Enriched Tweets

    No full text
    Fuzzy Cognitive Maps (FCMs) represent graphically the main concepts of a given domain and their relationships as a directed and weighted graph. As part of a growing need for intelligent systems that produce explanations for the decisions they make (the so-called XAI eXplainable Artificial Intelligence), due to their intuitive yet formal nature, FCMs are invaluable tools for modeling complex real world scenarios, but are traditionally created through the analysis of direct interviews with a number of domain experts, hence requiring a largely manual, expensive, and cumbersome effort. The aim of this work is to design, develop and test a method for the automatic generation of FCMs from raw data in form of Twitter conversations. In order to improve the recognized entities and to cope with brevity, ambiguity and jargon, messages in tweets are first enriched with both domain specific and general corpora, then analyzed and transformed into meaningful maps. As the data come from a population of common users instead of domain experts, the obtained FCMs are highly variable and should be read more as a snapshot of the beliefs of these users on a specific topic than an objective representation of what experts think on that topic. From clerical review, reported test cases confirm the viability and effectiveness of the proposed method

    A Fuzzy Logic-Based Weighting Model for GNSS Measurements from a Smartphone

    No full text
    GNSS navigation is critical in unfavourable scenarios, where the solution can be degraded by errors such as multipath reflections and weak geometries caused by obstacles surrounding the receiver. Nonetheless, the influence of the errors can be reduced defining an adequate quality measure for each signal and, consequently, using weights inversely related to the quality of the received signals. In this paper, a quality index, obtained from the fuzzy integration of various features of the received signals and leveraged to weight each measure in a Weighted Least Square (WLS) estimation process, is validated on measurements coming from a High Sensitivity receiver embedded in a smartphone. The main objective is to validate a fuzzy control designer provided by the authors in a previous work using raw data from a smartphone to compute the navigation solution and to extend its application to the multi-GNSS constellation case. The performance of the tested weighting strategy is evaluated in the position domain and in comparison with another weighting method. GNSS real data have been collected through a smartphone located in typical urban canyon environment, and processed in Single Point Positioning. Results show an evident enhancement obtained from the application of the fuzzy logic to obtain a proper weight to be assigned to GNSS observables reproducing a stochastic model similar to the reality

    The Journal Pattern for Streaming Data

    No full text
    In the distributed streaming data processing scenario, most of the frameworks implement minimal variations of the Publish-Subscribe pattern, where message passing happens directly between each Publishers and the group of its Subscribers. This work introduces a novel pattern, named Journal, that exploits a so called Editor for filtering or modifying the data stream in a principled manner. The Editor can be integrated into the Publish-Subscribe pattern with two different schemata, and has been used to implement multiple subsampling strategies, so to reduce the volume of the forwarded data, create new communication channels and match the ingestion capacity of the consumers. An actual test using Apache Kafka with a stream of simulated data has confirmed the viability of the Editor integration into Pub-Sub. We evidence that with the Journal pattern the risk of saturation of a channel can be significantly lowered and the latency of processing from clients can be notably reduced. We stress that the Journal pattern is very general and can be extended to multiple other purposes

    Historical trends of rain and air temperature in the Dominican Republic

    No full text
    The present work aims to characterize trends in air temperature and precipitation from the late 1930s to 2007 in the Dominican Republic, establishing whether some climate change patterns can be identified in the distribution of climate types of this country. The time series to be analysed present many quality issues and challenges, essentially due to abundance of missing data and inhomogeneous measurements. A number of statistical corrections have hence been applied: time series have been first filtered, then homogenized with respect to purposely built reference series, then completed through multiple imputation. Trend estimation has finally been performed on the annual and monthly scale. The analysis of homogenized and imputed series shows that significant trends occurred since 1930s, both in rain and air temperature. A pattern in the distribution of rain trends is evident in the country during the period 1939–2007, which reflects the influence of the orographic structure of the country on the atmospheric dynamics that dominate in the Caribbean region: significant negative annual trends are detectable in leeward areas, behind the main mountainous chains, while positive trends are generally evident in windward regions, exposed to trade winds. All the analysed series demonstrate an increase in air temperature: in Santo Domingo, minimum air temperature increased 3.0 ± 0.5°C since 1936, while the maximum air temperature increased 1.8 ± 0.4°C in the same period. Furthermore, an increase of rain erosivity can be detected on the South coast of the country, in some areas of the Cordillera Central and in the Northeast. Another important result is the increase of potential evapotranspiration, while significant uniform trends cannot be identified for extreme events

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore