1,721,009 research outputs found
Analysis, characterization and classification of Internet traffic
The Internet is a global interconnection of networks representing nowadays
one of the most important telecommunication technologies.
Born as an U.S. military project, it has evolved in a worldwide communication
system used by people every day. This success is based
on its ``freedom'' since no single organization or administration
entity governs or maintains it. This freedom also motivates the huge
heterogeneity of Internet services available today ranging from
working activities (e.g., VoIP, e-mail, etc.) to entertainment
(e.g., video games, streaming, peer-to-peer, etc.) and commerce (e.g., Amazon, eBay, etc.)
just to name a few.
The Internet is a fertile and in constant evolution system.
Every year new services and software platforms are launched affecting
not only the users' activities (e.g. social networks) but also the internal architecture of the networks
(e.g., Content Delivery Network vs peer-to-peer) or the devices used to access to the services
(e.g., PC vs smartphones and Internet tablets).
The richness of the Internet scenario is paid at the cost of its internal complexity.
Eric Schmidt, the CEO of Google, said:
\emph{``The Internet is the first
thing that humanity has built that humanity doesn't
understand, the largest experiment in anarchy that we have ever
had.''}\footnote{\url{http://www.brainyquote.com/quotes/authors/e/eric_schmidt.html}}.
At the origins, the Internet has been designed to operate on few standardized services.
None could have i) foreseen the success of this media and ii) designed
the network to cope with the plethora of nowadays services.
If on the one hand this diversity provides the Internet with a certain
level of resiliency and has driven innovation, on the other hand
understanding its internal mechanisms is a daunting task, made worse by
the fast and constant deployment of new services and applications.
However, behind what it could seem a chaotic scenario, the Internet is composed by
well defined markets in which big players participate having precise
interests:
\begin{description}
\item \textbf{Users}, representing the majority of the people which assess
to the network. They are interested in \emph{Quality of Experience} - QoE, i.e.,
having good performance when accessing to the network, avoiding for example
long delay related to the initial buffering when streaming a video.
They are also interested in the \emph{Network Neutrality}, preserving their freedom
to use the Internet independently from which service they are accessing;
\item \textbf{Internet Service Providers - ISP}, corresponding to
organizations which provide Internet access to the customers.
They are interested in incrementing the revenues through i) \emph{network engineering}
as to optimize the offered services and ii) studying the users' activity
as to find new \emph{billing policies};
\item \textbf{Content providers}, corresponding to organizations
which sell a specific Internet service, e.g., video streaming, file hosting, etc.
As for ISPs, they are interested in finding new way to make revenues.
At the same time, they have to cope also with illegal activities
as \emph{content piracy}, a common flaw since the early days
of peer-to-peer systems;
\item \textbf{Government regulation agencies}, corresponding to
organizations which regulate some aspects of the Internet activities.
For example, they study \emph{Service Level Agreements} - SLA
between users and ISPs, comparing the quality of
the Internet access offered to the users with respect to the specifications
written in the contract signed.
\end{description}
Other activities as \emph{security} are important for more than one
player. Consider for example \emph{malware} and \emph{Denial of Service} - DoS attacks.
These can violate the users' privacy, damaging the network and violate some laws.
Overall then, there are several motivations to be interested in studying the Internet.
Since the early days, the scientific community has made giant steps
toward understanding the Internet. We can generalize
that two requirements have to be satisfied.
First of all, we need \emph{tools and methodologies} as to inspect and characterize the traffic
at different granularities, i.e., per-packet,
per-flow, per-port, per-user, etc.
In particular, \emph{traffic classification} is one of most important activities
performed by network operators. It allows to identify which application has generated
a given communication and to study not only the whole network traffic aggregate
but also how different applications
participate in the composition of the total traffic.
Leveraging on these tools and methodologies, we can further drill into
performing \emph{users and network characterization}. For example,
monitoring the traffic over long-term periods, we can study the
applications' popularity trends and identify the rise of new technologies.
We can perform \emph{anomaly detection}, i.e., study unexpected network
condition that might be related to either security issues of malfunctioning
hardware. We can optimize routing policies, study inter-ISP traffic,
investigate the energy consumption of the network elements or
work on caching schemes related social network content,
just to name a few of the huge amount of research
studies recently conducted in the literature.
In this thesis, we present our contributions in studying the Internet
discussing the tools and methodologies developed to characterize the network traffic.
The thesis is divided in two parts.
In the first part we focus on traffic classification methodologies
starting from the problem definition and the available solutions in the literature
as reported in Chapter~\ref{chapter:traff_class}.
In the remaining of the first part
we focus on KISS, a novel traffic classification
technique we propose based on \emph{Stochastic Packet Inspection} (SPI) analysis.
In particular, in Chapter~\ref{chapter:kiss} we describe the framework used by the classifier
which is then validated in Chapter~\ref{sec:kiss_udp} and~\ref{sec:kiss_tcp} for
UDP and TCP traffic respectively.
Chapter~\ref{chapter:compare} is about the comparison of KISS with other
state of the art traffic classifier while in Chapter~\ref{sec:clustering}
we extend the KISS framework with some clustering techniques.
Overall, KISS allows to reach a high level of accuracy in traffic classification
which is comparable or even better with respect to other traffic classifiers.
It presents a flexible structure which is able to identify a rich set
of applications with a limited amount of resource requirements.
In the second part of the thesis we study YouTube,
the famous video streaming system. Leveraging on Tstat,
a passive traffic analyzer, we developed a methodology
to identify the YouTube video downloads and we conduct
an in depth analysis of many aspects of YouTube.
In Chapter~\ref{sec:yt-overview} we start presenting an
overview of the system and its components showing
the internal mechanisms adopted.
Chapter~\ref{sec:yt-methodology} reports an analysis
of the available methodologies in the literature to
study YouTube and presents our methodology based
on monitoring the real users' activities considering
different location, access technologies and devices.
In the remaining chapters we present the results of our analysis
grouped in four different areas of interest: video content
properties (Chapter~\ref{sec:yt-content}),
internal load balancing and caching policies (Chapter~\ref{sec:yt-cdn}),
users' habits and behaviours (Chapter~\ref{sec:user}), and
download performance (Chapter~\ref{sec:yt-performance}).
Results show that YouTube is a complex system where several
components interact with precise policies used to control
the communications. Besides its great success, the system
is far from being perfect and there is space for further optimization.
For example, mobile devices suffer more impairments during the download
with respect to PCs. Users stick to the default video resolution and are not interested
in changing the quality during the playback. Instead, it is
common the abruptly abort of the download. This behaviour
is particularly critical because, coupled
with aggressive buffering policies used to ensure continuity
in the playback, it leads to waste a non negligible amount of traffic, i.e.,
the users download a portion of the video which it is never played
KISS: Stochastic Packet Inspection Classifier for UDP Traffic
This paper proposes KISS, a novel Internet classifica- tion engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of Peer-to-Peer (P2P) streaming appli- cations, we propose a novel classification framework that leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square-like test, which extracts the protocol "format," but ignores the protocol "semantic" and "synchronization" rules. The signatures feed a decision process based either on the geometric distance among samples, or on Sup- port Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asym- metry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server proto- cols, VoIP, and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal to 98.1,% while results are al- most perfect when dealing with new P2P streaming applications
Many or few samples? Comparing transfer, contrastive and meta-learning in encrypted traffic classification
Mining Unclassified Traffic Using Automatic Clustering Techniques
In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols. The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffi
- …
