Search CORE

1,721,009 research outputs found

Analysis, characterization and classification of Internet traffic

Author: FINAMORE ALESSANDRO
Publication venue
Publication date: 01/01/2012
Field of study

The Internet is a global interconnection of networks representing nowadays one of the most important telecommunication technologies. Born as an U.S. military project, it has evolved in a worldwide communication system used by people every day. This success is based on its ``freedom'' since no single organization or administration entity governs or maintains it. This freedom also motivates the huge heterogeneity of Internet services available today ranging from working activities (e.g., VoIP, e-mail, etc.) to entertainment (e.g., video games, streaming, peer-to-peer, etc.) and commerce (e.g., Amazon, eBay, etc.) just to name a few. The Internet is a fertile and in constant evolution system. Every year new services and software platforms are launched affecting not only the users' activities (e.g. social networks) but also the internal architecture of the networks (e.g., Content Delivery Network vs peer-to-peer) or the devices used to access to the services (e.g., PC vs smartphones and Internet tablets). The richness of the Internet scenario is paid at the cost of its internal complexity. Eric Schmidt, the CEO of Google, said: \emph{``The Internet is the first thing that humanity has built that humanity doesn't understand, the largest experiment in anarchy that we have ever had.''}\footnote{\url{http://www.brainyquote.com/quotes/authors/e/eric_schmidt.html}}. At the origins, the Internet has been designed to operate on few standardized services. None could have i) foreseen the success of this media and ii) designed the network to cope with the plethora of nowadays services. If on the one hand this diversity provides the Internet with a certain level of resiliency and has driven innovation, on the other hand understanding its internal mechanisms is a daunting task, made worse by the fast and constant deployment of new services and applications. However, behind what it could seem a chaotic scenario, the Internet is composed by well defined markets in which big players participate having precise interests: \begin{description} \item \textbf{Users}, representing the majority of the people which assess to the network. They are interested in \emph{Quality of Experience} - QoE, i.e., having good performance when accessing to the network, avoiding for example long delay related to the initial buffering when streaming a video. They are also interested in the \emph{Network Neutrality}, preserving their freedom to use the Internet independently from which service they are accessing; \item \textbf{Internet Service Providers - ISP}, corresponding to organizations which provide Internet access to the customers. They are interested in incrementing the revenues through i) \emph{network engineering} as to optimize the offered services and ii) studying the users' activity as to find new \emph{billing policies}; \item \textbf{Content providers}, corresponding to organizations which sell a specific Internet service, e.g., video streaming, file hosting, etc. As for ISPs, they are interested in finding new way to make revenues. At the same time, they have to cope also with illegal activities as \emph{content piracy}, a common flaw since the early days of peer-to-peer systems; \item \textbf{Government regulation agencies}, corresponding to organizations which regulate some aspects of the Internet activities. For example, they study \emph{Service Level Agreements} - SLA between users and ISPs, comparing the quality of the Internet access offered to the users with respect to the specifications written in the contract signed. \end{description} Other activities as \emph{security} are important for more than one player. Consider for example \emph{malware} and \emph{Denial of Service} - DoS attacks. These can violate the users' privacy, damaging the network and violate some laws. Overall then, there are several motivations to be interested in studying the Internet. Since the early days, the scientific community has made giant steps toward understanding the Internet. We can generalize that two requirements have to be satisfied. First of all, we need \emph{tools and methodologies} as to inspect and characterize the traffic at different granularities, i.e., per-packet, per-flow, per-port, per-user, etc. In particular, \emph{traffic classification} is one of most important activities performed by network operators. It allows to identify which application has generated a given communication and to study not only the whole network traffic aggregate but also how different applications participate in the composition of the total traffic. Leveraging on these tools and methodologies, we can further drill into performing \emph{users and network characterization}. For example, monitoring the traffic over long-term periods, we can study the applications' popularity trends and identify the rise of new technologies. We can perform \emph{anomaly detection}, i.e., study unexpected network condition that might be related to either security issues of malfunctioning hardware. We can optimize routing policies, study inter-ISP traffic, investigate the energy consumption of the network elements or work on caching schemes related social network content, just to name a few of the huge amount of research studies recently conducted in the literature. In this thesis, we present our contributions in studying the Internet discussing the tools and methodologies developed to characterize the network traffic. The thesis is divided in two parts. In the first part we focus on traffic classification methodologies starting from the problem definition and the available solutions in the literature as reported in Chapter~\ref{chapter:traff_class}. In the remaining of the first part we focus on KISS, a novel traffic classification technique we propose based on \emph{Stochastic Packet Inspection} (SPI) analysis. In particular, in Chapter~\ref{chapter:kiss} we describe the framework used by the classifier which is then validated in Chapter~\ref{sec:kiss_udp} and~\ref{sec:kiss_tcp} for UDP and TCP traffic respectively. Chapter~\ref{chapter:compare} is about the comparison of KISS with other state of the art traffic classifier while in Chapter~\ref{sec:clustering} we extend the KISS framework with some clustering techniques. Overall, KISS allows to reach a high level of accuracy in traffic classification which is comparable or even better with respect to other traffic classifiers. It presents a flexible structure which is able to identify a rich set of applications with a limited amount of resource requirements. In the second part of the thesis we study YouTube, the famous video streaming system. Leveraging on Tstat, a passive traffic analyzer, we developed a methodology to identify the YouTube video downloads and we conduct an in depth analysis of many aspects of YouTube. In Chapter~\ref{sec:yt-overview} we start presenting an overview of the system and its components showing the internal mechanisms adopted. Chapter~\ref{sec:yt-methodology} reports an analysis of the available methodologies in the literature to study YouTube and presents our methodology based on monitoring the real users' activities considering different location, access technologies and devices. In the remaining chapters we present the results of our analysis grouped in four different areas of interest: video content properties (Chapter~\ref{sec:yt-content}), internal load balancing and caching policies (Chapter~\ref{sec:yt-cdn}), users' habits and behaviours (Chapter~\ref{sec:user}), and download performance (Chapter~\ref{sec:yt-performance}). Results show that YouTube is a complex system where several components interact with precise policies used to control the communications. Besides its great success, the system is far from being perfect and there is space for further optimization. For example, mobile devices suffer more impairments during the download with respect to PCs. Users stick to the default video resolution and are not interested in changing the quality during the playback. Instead, it is common the abruptly abort of the download. This behaviour is particularly critical because, coupled with aggressive buffering policies used to ensure continuity in the playback, it leads to waste a non negligible amount of traffic, i.e., the users download a portion of the video which it is never played

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Measurement-Centered Approach to Latency Reduction

Author: MELLIA Marco
FINAMORE ALESSANDRO
Trammel B.
Publication venue
Publication date: 01/01/2013
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

KISS: Stochastic Packet Inspection Classifier for UDP Traffic

Author: Mellia Marco
Meo Michela
Finamore Alessandro
ROSSI D.
Rossi D.
Publication venue
Publication date: 01/01/2010
Field of study

This paper proposes KISS, a novel Internet classifica- tion engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of Peer-to-Peer (P2P) streaming appli- cations, we propose a novel classification framework that leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square-like test, which extracts the protocol "format," but ignores the protocol "semantic" and "synchronization" rules. The signatures feed a decision process based either on the geometric distance among samples, or on Sup- port Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asym- metry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server proto- cols, VoIP, and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal to 98.1,% while results are al- most perfect when dealing with new P2P streaming applications

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

“It’s a match!” A benchmark of task affinity scores for joint learning

Author: Azorin Raphaël; Gallo, Massimo; Finamore, Alessandro; Rossi, Dario; Michiardi, Pietro
Publication venue
Publication date: 2023
Field of study

EURECOM Repository

RFMI: Estimating mutual information on rectified flow for text-to-image alignment

Author: Wang Chao; Franzese, Giulio; Finamore, Alessandro; Michiardi, Pietro
Publication venue
Publication date: 2025
Field of study

EURECOM Repository

Information theoretic text-to-image alignment

Author: Wang Chao; Franzese, Giulio; Finamore, Alessandro; Gallo, Massimo; Michiardi, Pietro
Publication venue
Publication date: 2025
Field of study

EURECOM Repository

Many or few samples? Comparing transfer, contrastive and meta-learning in encrypted traffic classification

Author: Guarino Idio; Wang, Chao; Finamore, Alessandro; Pescape, Antonio; Rossi, Dario
Publication venue
Publication date: 2023
Field of study

EURECOM Repository

In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols. The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffi

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino