1,720,963 research outputs found
Data Analysis and Modelling of Users' Behavior on the Web
The research developed during my PhD was driven by the need to understand how people interact with the web. This information gives ISPs and network managers better visibility and understanding of how users and web services change over time. Thanks to traces and logs of users' traffic, my work focuses on two complementary aspects: (i) data analytics, and (ii) user modelling.In this work, I show how to reconstruct users' online activity from passive measurements and to model their behaviour. I introduce machine learning approaches to identify the intentionally visited web-pages and web-sites. I highlight device usage evolution, the structure of the navigation and the interactions with social networks and search engines. I build users' profiles and then I show how to re-identify users in a future time thanks to their behavioural fingerprints. This is also instrumental for security applications. I next study the interaction with online ads, capturing the impact of the temporal dynamics of shown advertisement and improving revenues.I make available all the anonymized datasets and code for the community, to guarantee results reproducibility and foster further analyses
Recommendation Systems in Libraries: an Application with Heterogeneous Data Sources
The Reading[&]Machine project exploits the support of digitalization to increase the attractiveness of libraries and improve
the users’ experience. The project implements an application that helps the users in their decision-making process, providing
recommendation system (RecSys)-generated lists of books the users might be interested in, and showing them through an
interactive Virtual Reality (VR)-based Graphical User Interface (GUI). In this paper, we focus on the design and testing of the
recommendation system, employing data about all users’ loans over the past 9 years from the network of libraries located in
Turin, Italy. In addition, we use data collected by the Anobii online social community of readers, who share their feedback
and additional information about books they read. Armed with this heterogeneous data, we build and evaluate Content Based
(CB) and Collaborative Filtering (CF) approaches. Our results show that the CF outperforms the CB approach, improving
by up to 47% the relevant recommendations provided to a reader. However, the performance of the CB approach is heavily
dependent on the number of books the reader has already read, and it can work even better than CF for users with a large
history. Finally, our evaluations highlight that the performances of both approaches are significantly improved if the system
integrates and leverages the information from the Anobii dataset, which allows us to include more user readings (for CF) and
richer book metadata (for CB)
Mining Patterns in Mobile Network Logs
Alarm logs are a valuable source of information and play a crucial role in network management. Network devices such as backbone routers or 3G/4G base stations generate verbose and detailed logs that network managers process to detect problems and identify their root causes. Manual analysis of such logs is extremely time-consuming because of the extensive amount of data. Therefore, finding suitable automatic methods to process logs is an important problem in the network analysis area.In this paper, we target the automatic extraction of situations, i.e., sequences of events occurring close in time and space which identify common and recurring patterns. We adopt an unsupervised machine learning approach to automatically mine logs and provide information and correlations in network failures. We face a real use case processing more than 2 million alarms generated by 2 months of TIM Network Operations Center in Northern Italy. Most of the features are categorical and call for specific methodologies to process them. We choose rule mining of frequent items. We focus on event logs and apply rule mining methods to extract temporal-spatial correlations and co-occurrences, i.e., situations. To ease the analyst work, we highlight the most important rules and offer visualization techniques in both spatial and temporal dimensions. Results have been verified to be helpful to recognize common situations and identify possible future anomalies
Machine learning supported next-maintenance prediction for industrial vehicles
Industrial and construction vehicles require tight periodic maintenance operations. Their schedule depends on vehicle characteristics and usage. The latter can be accurately monitored through various on-board devices, enabling the application of Machine Learning techniques to analyze vehicle usage patterns and design predictive analytics. This paper presents a data-driven application to automatically schedule the periodic maintenance operations of industrial vehicles. It aims to predict, for each vehicle and date, the actual remaining days until the next maintenance is due. Our Machine Learning solution is designed to address the following challenges: (i) the non-stationarity of the per-vehicle utilization time series, which limits the effectiveness of classic scheduling policies, and (ii) the potential lack of historical data for those vehicles that have recently been added to the fleet, which hinders the learning of accurate predictors from past data. Preliminary results collected in a real industrial scenario demonstrate the effectiveness of the proposed solution on heterogeneous vehicles. The system we propose here is currently under deployment, enabling further tests and tunings
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Heterogeneous industrial vehicle usage predictions: A real case
Predicting future vehicle usage based on the analysis of CAN bus data is a popular data mining application. Many of the usage indicators, like the utilization hours, are non-stationary time series. To predict their values, recent approaches based on Machine Learning combine multiple data features describing engine status, travels, and roads. While most of the proposed solutions address cars and trucks usage prediction, a smaller body of work has been devoted to industrial and construction vehicles, which are usually characterized by more complex and heterogeneous usage
patterns. This paper describes a real case study performed on a 4-year CAN bus dataset collecting usage data about 2 250 construction vehicles of various types and models. We apply a statistics-based approach to select the most discriminating data features. Separately for each vehicle, we train regression algorithms on historical data enriched with contextual information. The achieved results demonstrate the effectiveness of the proposed solution
- …
