Search CORE

1,721,026 research outputs found

Data Analysis and Modelling of Users’ Behaviour on the Web

Author: VASSIO LUCA
Publication venue
Publication date: 15/03/2018
Field of study

As novas tecnologias e as suas aplicações modificaram as nossas interações com o mundo que nos circunda. O advento da Internet, com a sua capilaridade e seu uso generalizo, foi a transformação mais importante e repentina dos últimos 30 anos. Minha pesquisa nasce da necessidade de entender como as pessoas interagem com a web, de compreender como a web está evoluindo, e de modelar os hábitos e comportamentos dos usuários da Internet. Logs que registram o comportamentos dos usuários interagindo com a web, coletados através de medições passivas, oferecem uma oportunidade inigualável para estudar esses fenômenos. Baseado nesse tipo de logs, o meu trabalho foca em dois aspectos complementares: (i) na análise da navegação dos usuários e (ii) na modelagem do comportamento dos usuários. Muitos desafios devem de ser enfrentados para viabilizar essa análise: medições passivas são em geral volumosas, ou seja \textit{big data}, e por isso requerem metodologias e infra-estrutura escaláveis para seu processamento. A análise dos dados necessita de métricas significativas e a introdução de metodologias inovadoras para a obtenção de informações confiáveis, filtradas, limpas e, sobretudo, úteis. A análise requer métodos estatísticos, de aprendizagem de máquina e de mineração de dados robustos. Além disso, a análise deve servir de base para a criação de modelos analíticos que sejam aderentes à realidade. Em soma, entender a aplicabilidade dos modelos é um passo fundamental para analisar possíveis cenários de uso e otimizar a performance dos serviços web. Durante o doutorado eu analisei três anos de dados de cerca de 30\,000 consumidores de Internet de alta velocidade, reconstruindo a atividade dos usuários na web. Reconstruí as suas atividades de navegação, destacando a evolução no uso de diferentes dispositivos, a estrutura da navegação e a interação dos usuários com as redes sociais e os motores de busca. Introduzi uma nova metodologia de aprendizado de máquina para identificar páginas web e sites intencionalmente solicitados pelos usuários nos logs de medidas passivas. A partir dessas informações, demonstrei ser possível criar uma assinatura baseado nos sites visitados por cada usuário, que pode ser utilizadas para re-identificar usuários, com claras implicações para a privacidade on-line. Modelei a sequência de serviços visitados pelos usuários na web, representando-os de forma sucinta e interpretável. Mostrei como extrair automaticamente grupos de sites similares ou conectados, agrupando os interesses de usuários e de comunidades. Também modelei a interação dos usuários com sistemas de recomendação on-line, apresentando um modelo de comportamento que captura o impacto da dinâmica temporal dos anúncios exibidos nas páginas. Finalmente, mostrei como melhorar os ganhos de uma plataforma de propaganda digital, otimizando os horários nos quais os anúncios deveriam ser exibidos aos usuários. Os resultados dessa tese têm várias implicações para diferentes personagens na Internet e para a comunidade acadêmica. Na atual transformação digital, todas as pessoas e todos os objetos estão produzindo dados que podem ser explorados para criar novas aplicações revolucionarias. A análise dos dados de navegação nos permite realizar transformações incríveis não só na web, mas também em nossas cidades, na industria e na produção de energia. Aproveitar o conhecimento do comportamento do usuário obtido a partir de medições na rede e depois modelar e otimizar os sistemas, como feito neste trabalho, será um fator chave para a concepção de futuras cidades inteligentes.Le nuove tecnologie e le loro applicazioni modificano il nostro approccio con ciò che ci circonda. L'avvento di Internet, con la sua capillarità e pervasività, è stata la trasformazione più importante e repentina degli ultimi 30 anni. La mia ricerca è stata guidata dalla necessità di capire come le persone interagiscano con il web, di catturare come il web stesso cambi, e di modellare le abitudini e i comportamenti degli utenti. Tracce e registri dell'attività online, altrimenti dette misure passive, offrono informazioni inestimabili per raggiungere questi obiettivi. Grazie a queste tracce, il mio lavoro si concentra nello studiare il comportamento delle persone quando navigano su Internet, da due punti di vista complementari: (i) l'analisi dei dati di navigazione e (ii) i modelli analitici di comportamento. Tuttavia, vi sono molteplici sfide da affrontare: questo tipo di dati, detti \textit{big data}, necessitano di hardware e software scalabili, e dell'introduzione di metodologie e metriche innovative per ottenere informazioni che siano pulite, affidabili e soprattutto utili. L'analisi dati viene eseguita grazie a metodi statistici, di machine learning e di data mining. Inoltre, l'analisi è un prerequisito per costruire dei modelli analitici dei fenomeni studiati, che siano il più possibile aderenti alla realtà. Infine, capire l'applicabilità dei modelli costruiti è un passaggio fondamentale per ottimizzare le prestazioni e capire i possibili scenari. Più in dettaglio, durante il mio dottorato, ho analizzato 3 anni di dati di circa 30\,000 abitazioni, e ne ho ricostruito le attività online. Grazie a ciò, ho potuto mostrare l'evoluzione nell'utilizzo di diversi dispositivi, la struttura intrinseca delle navigazioni e l'interazione con le reti sociali e i motori di ricerca. Ho introdotto dei sistemi automatici per identificare le pagine e i servizi web intenzionalmente richiesti. Ho anche analizzato la costruzione di profili degli utenti, tracciando i loro domini visitati, per poi mostrare come poterli re-identificare nel futuro. Ho modellato le sequenze di siti visti, rappresentandole succintamente in una maniera facilmente interpretabile. Ho mostrato come estrarre automaticamente gruppi di siti web simili in contenuto o strettamente relazionati, e come riunire interessi e trend di utenti singoli o intere comunità. Ho anche modellato l'interazione con i sistemi di raccomandazione, introducendo un modello di comportamento umano che cattura l'impatto della dinamica temporale delle pubblicità mostrate. Infine, ho migliorato sperimentalmente i ricavi di una piattaforma di pubblicità, ottimizzandone i tempi di visualizzazione delle inserzioni. I miei risultati hanno diverse implicazioni per i molteplici attori nel panorama web e per il mondo della ricerca. Seguendo un corretto approccio scientifico, I dataset usati in questa tesi sono resi disponibili in modo anonimizzato per la comunità, in modo da garantire la riproducibilità dei miei risultati. Inoltre, il tema della privacy online in un mondo in forte cambiamento è stato affrontato e analizzato, con l'obiettivo di trovare un compromesso tra il bisogno di ottenere la conoscenza per lo sviluppo delle tecnologie e la necessità di non violare la riservatezza degli individui. Infine, l'attuale trasformazione digitale comporta che tutte le persone e oggetti producono dati che possano essere sfruttati per creare sconvolgenti possibilità. L'analisi dati ci permette di realizzare incredibili trasformazioni non solo di Internet, ma anche nelle nostre città, nella produzione di energia o nell'industria. Sfruttare i comportamenti delle persone che si ottengono attraverso questi dati, modellare e ottimizzare le prestazioni dei sistemi così come ho fatto in questo lavoro, sarà un fattore chiave per progettare le città intelligenti di un futuro molto vicino.New technologies and services strongly transform our approach with the world. The Internet and its pervasive use was certainly the most dramatic leap in the last 30 years. My research was driven by the need to understand how people interact with the web, capturing its characteristics and changes, and modelling people's inner habits and interactions. Traces and logs of users' behaviours collected in the Internet (i.e., passive measurements) offer invaluable information to obtain this goal. Thanks to these passive traces, my work focuses on studying the behaviour of the users on the Internet, with focus on two complementary aspects: (i) data analytics, and (ii) user modelling. There are many key challenges to face: (big) data requires the use of scalable software and hardware. It demands also the introduction of innovative methodologies and meaningful metric to obtain trustable, filtered, clean and useful information. Data analytics is performed by means of a variety of statistical, machine learning and data mining approaches. Moreover, it is also a pre-requisite for creating analytical models of the studied phenomena, that should be as much as possible adherent to the reality. Lastly, understanding the applicability of derived models is a fundamental step for optimizing performances and understanding possible scenarios. More in details, during my PhD I analyzed 3 years of data of about 30\,000 households. I reconstruct users' online activity. Thanks to this, I was able to highlight device usage evolution, the intrinsic structure of the navigation and the interactions with social networks and search engines. I introduced a new machine learning approach to identify the intentionally visited web-pages and web-sites. Then, I built specific users' profiles, fingerprinting their visited domains, and then I showed how to re-identify users in a future time. I modelled the sequence of the visited web services, representing them in a succinct and interpretable manner. I showed that I can automatically extract groups of similar or likely connected web-sites, and monitor the interests and browsing patterns of single users or communities. I also modelled the user interaction with online recommendation systems, introducing a user behavioural model that captures the impact of the temporal dynamics of shown advertisement. Lastly, I demonstrate how to improve the revenue of an advertisement platform, optimizing the timings when ads are shown to users. My findings have several direct implications to the different Internet actors and to the research community. Following the scientific approach, I made available the anonymized datasets for the community, in order to guarantee the reproducibility of my results. Moreover, I addressed the problem of privacy online in today changing world, with the objective of finding a trade-off between the desire to obtain knowledge for shaping new technologies and the need to not violate the privacy of individuals. Finally, the current digital transformation implicates that everyone and everything produce data that can be exploited to create new disruptive capabilities. Data analytics allows us to realize incredible transformations not only in the web, but also in our cities, in the energy production, and in manufacturing. Exploiting the knowledge of the users' behaviour from these data, modelling and optimizing system performances as I did in my work, will be a key factor for designing near future smart-cities

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A hybrid swarm-based algorithm for single-objective optimization problems involving high-cost analyses

Author: Vassio Luca
Ampellio Enrico
Publication venue
Publication date: 01/01/2016
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Human Behaviour on the Web: Evolution, Interactions and Exploitation

Author: Vassio Luca
Luca Vassio
Publication venue
Publication date: 01/01/2019
Field of study

The Web has a fundamental impact on our life, and its usage is quite dynamic and heterogeneous. Moreover, the Web, and in particular Online Social Networks allow people to communicate directly with the public, bypassing filters of traditional medias. Among the others, politicians and companies are exploiting this technologies to widen their influence. In the talk I will show techniques to capture such usage evolution and analyze people interaction on the Internet. This information allows us to understand how users and web services change over time, and how someone can take advantage of these behaviours

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A multi-faceted characterization of free-floating car sharing service usage

Author: Vassio Luca
Cagliero Luca
Giordano Danilo
Publication venue
Publication date: 01/01/2021
Field of study

During the last decade, car sharing systems appeared in many cities and gained popularity. The research community has analyzed their current utilization trends in different contexts, their growth perspectives, and their gradual shift towards more sustainable technologies. Through the large and heterogeneous amount of car sharing usage data that is now available, researchers have been able to gain new insights into these services. In this paper, we provide an extensive char-acterization of the Free-Floating Car Sharing (FFCS) service usage in 23 cities in Europe and North America over a 14-month period. From our data about FFCS services, we detail fleet size, oper-ating area, and characteristics of the car bookings and rentals. We also identify temporal patterns that are peculiar to specific cities and countries. We further highlight urban zones with high attractiveness or with a high rental generation rate. Finally, we compare the systems relying on internal combustion engine cars with those based on electric vehicles in terms of various in-dicators, including the influence on car refueling. The results show that car utilization patterns are rather variable across cities with the highest per-car utilization rate in Madrid. The majority of the cities show negative or stable usage trends due to either the reduced appeal of the service or the presence of inefficiencies in the service provision. These data-driven insights may help system managers assess the provided services’ profitability and sustainability from multiple perspectives

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A hybrid ABC for expensive optimizations: CEC 2016 competition benchmark

Author: Vassio Luca
Ampellio Enrico
Luca Vassio
Enrico Ampellio
Publication venue
Publication date: 01/01/2016
Field of study

An evolution of the Artificial Bee Colony (ABC) optimization algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented for leading to the best improvement with a low number of analyses. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of ABC structure and original hybridizations with interpolation strategies. The aforementioned techniques are tested on the expensive benchmark of the Special Session on RealParameter Single Objective Optimization at CEC 2016. In this specific case, the hybridization with a quadratic trust region approach assumes a major importance. Moreover, the AsBeC results are compared to the algorithms tested on the same benchmark at CEC 2015, showing remarkable competitiveness and robustnes

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

On Cost-Effectiveness of Language Models for Time Series Anomaly Detection

Author: Vassio Luca
Yassine Ali
Cagliero Luca
Publication venue
Publication date: 01/01/2026
Field of study

Detecting anomalies in time series data is crucial across several domains, including healthcare, finance, and automotive. Large Language Models (LLMs) have recently shown promising results by leveraging robust model pretraining. However, fine-tuning LLMs with several billion parameters requires a large number of training samples and significant training costs. Conversely, LLMs under a zero-shot learning setting require lower overall computational costs, but can fall short in handling complex anomalies. In this paper, we explore the use of lightweight language models for Time Series Anomaly Detection, either zero-shot or via fine-tuning them. Specifically, we leverage lightweight models that were originally designed for time series forecasting, benchmarking them for anomaly detection against both open-source and proprietary LLMs across different datasets. Our experiments demonstrate that lightweight models (70 Billions)

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Debate on online social networks at the time of COVID-19: An Italian case study

Author: Trevisan Martino
Vassio Luca
Giordano Danilo
Publication venue
Publication date: 01/01/2021
Field of study

The COVID-19 pandemic is not only having a heavy impact on healthcare but also changing people’s habits and the society we live in. Countries such as Italy have enforced a total lockdown lasting several months, with most of the population forced to remain at home. During this time, online social networks, more than ever, have represented an alternative solution for social life, allowing users to interact and debate with each other. Hence, it is of paramount importance to understand the changing use of social networks brought about by the pandemic. In this paper, we analyze how the interaction patterns around popular influencers in Italy changed during the first six months of 2020, within Instagram and Facebook social networks. We collected a large dataset for this group of public figures, including more than 54 million comments on over 140 thousand posts for these months. We analyze and compare engagement on the posts of these influencers and provide quantitative figures for aggregated user activity. We further show the changes in the patterns of usage before and during the lockdown, which demonstrated a growth of activity and sizable daily and weekly variations. We also analyze the user sentiment through the psycholinguistic properties of comments, and the results testified the rapid boom and disappearance of topics related to the pandemic. To support further analyses, we release the anonymized dataset

Archivio istituzionale della ricerca - Università di Trieste

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Disentangling the Information Flood on OSNs: Finding Notable Posts and Topics

Author: Trevisan Martino
Vassio Luca
Caso Paola
Publication venue
Publication date: 01/01/2022
Field of study

Online Social Networks (OSNs) are an integral part of modern life for sharing thoughts, stories, and news. An ecosystem of influencers generates a flood of content in the form of posts, some of which have an unusually high level of engagement with the influencer’s fan base. These posts relate to blossoming topics of discussion that generate particular interest among users: The COVID-19 pandemic is a prominent example. Studying these phenomena provides an understanding of the OSN landscape and requires appropriate methods. This paper presents a methodology to discover notable posts and group them according to their related topic. By combining anomaly detection, graph modelling and community detection techniques, we pinpoint salient events automatically, with the ability to tune the amount of them. We showcase our approach using a large Instagram dataset and extract some notable weekly topics that gained momentum from 1.4 million posts. We then illustrate some use cases ranging from the COVID-19 outbreak to sporting events

Archivio istituzionale della ricerca - Università di Trieste

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification

Author: Vassio Luca
Zhao Yuqi
Mellia Marco
Boffa Matteo
Dettori Giovanni
Publication venue
Publication date: 01/01/2025
Field of study

Recently we have witnessed the explosion of proposals that, inspired by Language Models like BERT, exploit Representation Learning models to create traffic representations. All of them promise astonishing performance in encrypted traffic classification (up to 98% accuracy). In this paper, with a networking expert mindset, we critically reassess their performance. Through extensive analysis, we demonstrate that the reported successes are heavily influenced by data preparation problems, which allow these models to find easy shortcuts - spurious correlation between features and labels - during fine-tuning that unrealistically boost their performance. When such shortcuts are not present - as in real scenarios - these models perform poorly. We also introduce Pcap-Encoder, an LM-based representation learning model that we specifically design to extract features from protocol headers. Pcap-Encoder appears to be the only model that provides an instrumental representation for traffic classification. Yet, its complexity questions its applicability in practical settings. Our findings reveal flaws in dataset preparation and model training, calling for a better and more conscious test design. We propose a correct evaluation methodology and stress the need for rigorous benchmarking

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Modeling communication asymmetry and content personalization in online social networks

Author: Vassio Luca
Galante Franco
Garetto Michele
Leonardi Emilio
Publication venue
Publication date: 01/01/2023
Field of study

The increasing popularity of online social networks (OSNs) attracted growing interest in modeling social interactions. On online social platforms, a few individuals, commonly referred to as influencers, produce the majority of content consumed by users and hegemonize the landscape of the social debate. However, classical opinion models do not capture this communication asymmetry. We develop an opinion model inspired by observations on social media platforms with two main objectives: first, to describe this inherent communication asymmetry in OSNs, and second, to model the effects of content personalization. We derive a Fokker-Planck equation for the temporal evolution of users' opinion distribution and analytically characterize the stationary system behavior. Analytical results, confirmed by Monte-Carlo simulations, show how {strict forms of} content personalization tend to radicalize user opinion, leading to the emergence of echo chambers, and favor structurally advantaged influencers. As an example application, we apply our model to Facebook data during the Italian government crisis in the summer of 2019. Our work provides a flexible framework to evaluate the impact of {content personalization on the opinion formation process, focusing on the interaction betweeni nfluential individuals and regular users. This framework is interesting in the context of marketing and advertising, misinformation spreading, politics and activism

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)