1,720,985 research outputs found
Scalable Fine-Grained Behavioral Clustering of HTTP-Based Malware
A large number of today’s botnets leverage the HTTP protocol to communicate with their botmasters or perpetrate malicious activities. In this paper, we present a new scalable system for network-level behavioral clustering of HTTP-based malware that aims to efficiently group newly collected malware samples into malware family clusters. The end goal is to obtain malware clusters that can aid the automatic generation of high quality network signatures, which can in turn be used to detect botnet command-and-control (C&C) and other malware-generated communications at the network perimeter. We achieve scalability in our clustering system by simplifying the multi-step clustering process proposed in [30], and by leveraging incremental clustering algorithms that run efficiently on very large datasets. At the same time, we show that scalability is achieved while retaining a good trade-off between detection rate and false positives for the signatures derived from the obtained malware clusters. We implemented a proof-of-concept version of our new scalable malware clustering system and performed experiments with about 65,000 distinct malware samples. Results from our evaluation confirm the effectiveness of the proposed system and show that, compared to [30], our approach can reduce processing times from several hours to a few minutes, and scales well to large datasets containing tens of thousands of distinct malware samples
Pharmaguard WebApp: An application for the detection of illegal online pharmacies
We present a demo for PharmaGuard, a novel system for the automatic discovery of illegal online pharmacies. With its easy to use graphic user interface, a web application architectural approach and leveraging the powers of automatic knowledge discovery, PharmaGuard can assist law enforcement agencies in identifying, blacklisting and shutting-down illegal pharmacies
Machine Learning in Security Applications
One of the most important assets to be protected is information, as
every aspect of the life of a society deeply depends on the available information.
Nowadays, information is stored, processed, and communicated by computers.
It turns out that computers represent the most critical tool in modern society. A
number of protection mechanisms are available so far, such as antivirus software
tools, and biometric access control systems. For their effectiveness, frequent updates
are needed, due to the rapid evolution of attack patterns. In fact, attacks are
often devised and spread by running computer programs, which can produce new
effective attacks in a short time frame. It turns out that machine learning techniques
with their generalization capability are one of the favorite approaches to
deploy protection and attack detection mechanisms. In this paper, we discuss the
approaches that should be followed when devising machine learning techniques
for security applications. In particular, we will focus on testing methodologies,
performance measures, and techniques aimed at reducing the intrinsic variability
of performance that often machine learning application exhibit in real-world
scenarios
McPAD: A multiple classifier system for accurate payload-based anomaly detection
Anomaly-based network intrusion detection systems (IDS) are valuable tools for the defense-in-depth of computer networks. Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. Such anomaly-based network IDS are able to detect (unknown) zero-day attacks, although much care has to be dedicated to controlling the amount of false positives generated by the detection system. As a matter of fact, it is has been shown that the false positive rate is the true limiting factor for the performance of IDS, and that in order to substantially increase the Bayesian detection rate, P(Intrusion/Alarm), the IDS must have a very low false positive rate (e.g., as low as 10(-5) or even lower).
In this paper we present McPAD (multiple classifier payload-based anomaly detector), a new accurate payload-based anomaly detection system that consists of an ensemble of one-class classifiers. We show that our anomaly detector is very accurate in detecting network attacks that bear some form of sheH-code in the malicious payload. This holds true even in the case of polymorphic attacks and for very low false positive rates. Furthermore, we experiment with advanced polymorphic blending attacks and we show that in some cases even in the presence of such sophisticated attacks and for a low false positive rate our IDS still has a relatively high detection rate.
A Structural and Content-Based Approach for a Precise and Robust Detection of Malicious PDF Files
During the past years, malicious PDF files have become a serious threat for the security of modern computer systems. They are characterized by a complex structure and their variety is considerably high. Several solutions have been academically developed to mitigate such attacks. However, they leveraged on information that were extracted from either only the structure or the content of the PDF file. This creates problems when trying to detect non-Javascript or targeted attacks. In this paper, we present a novel machine learning system for the automatic detection of malicious PDF documents. It extracts information from both the structure and the content of the PDF file, and it features an advanced parsing mechanism. In this way, it is possible to detect a wide variety of attacks, including non-Javascript and parsing-based ones. Moreover, with a careful choice of the learning algorithm, our approach provides a significantly higher accuracy compared to other static analysis techniques, especially in the presence of adversarial malware manipulation
Detecting Misuse of Google Cloud Messaging in Android Badware
Google Cloud Messaging (GCM) is a widely-used and reliable mechanism that helps developers to build more efficient Android applications; in particular, it enables sending push notifications to an application only when new information is available for it on its servers. For this reason, GCM is now used by more than 60% among the most popular Android applications. On the other hand, such a mechanism is also exploited by attackers to facilitate their malicious activities; e.g., to abuse functionality of advertisement libraries in adware, or to command and control bot clients. However, to our knowledge, the extent to which GCM is used in malicious Android applications (badware, for short) has never been evaluated before. In this paper, we do not only aim to investigate the aforementioned issue, but also to show how traces of GCM flows in Android applications can be exploited to improve Android badware detection. To this end, we first extend Flowdroid to extract GCM flows from Android applications. Then, we embed those flows in a vector space, and train different machine-learning algorithms to detect badware that use GCM to perform malicious activities. We demonstrate that combining different classifiers trained on the flows originated from GCM services allows us to improve the detection rate up to 2.4%, while decreasing the false positive rate by 1.9%, and, more interestingly, to correctly detect 14 never-before-seen badware applications
Machine learning in computer forensics (and the lessons learned from machine learning in computer security)
In this paper, we discuss the role that machine learning can play in computer forensics. We begin our analysis by considering the role that machine learning has gained in computer security applications, with the aim of aiding the computer forensics community in learning the lessons from the experience of the computer security community. Afterwards, we propose a brief literature review, with the purpose of illustrating the areas of computer forensics where machine learning techniques have been used until now. Then, we remark the technical requirements that should be meet by tools for computer security and computer forensics applications, with the goal of illustrating in which way machine learning algorithms can be of any practical help. We intend this paper to foster applications of machine learning in computer forensics, and we hope that the ideas in this paper may represent promising directions to pursue in the quest for more efficient and effective computer forensics tools
- …
