Securing IMS against Novel Threats.

| Thursday, July 9, 2009

Recently, we have published an interesting article at Bell Labs Technical Journal on detecting unknown threats in IMS and VoIP networks. This article has been the outcome of a fruitful cooperation of Fraunhofer FIRST and Alcatel-Lucent in Germany. The article's abstract is here:

Fixed mobile convergence (FMC) based on the 3GPP IP Multimedia
Subsystem (IMS) is considered one of the most important communication technologies of this decade. Yet this all-IP-based network technology brings about the growing danger of security vulnerabilities in communication and data services. Protecting IMS infrastructure servers against malicious exploits poses a major challenge due to the huge number of systems that may be affected. We approach this problem by proposing an architecture for an autonomous and self-sufficient monitoring and protection system for devices and infrastructure inspired by network intrusion detection techniques. The crucial feature of our system is a signature-less detection of abnormal events and zero-day attacks. These attacks may be hidden in a single message or spread across a sequence of messages. Anomalies identified at any of the network domain's ingresses can be further analyzed for discriminative patterns that can be immediately distributed to all edge nodes in the network domain.

Securing IMS against Novel Threats. Stefan Wahl, Konrad Rieck, Pavel Laskov, Peter Domschitz and Klaus-Robert Müller. Bell Labs Technical Journal (BLTJ), 14(1), 243-257, John Wiley & Sons, May 2009.

Some Tuning for Feature Extraction.

| Friday, May 29, 2009

One key to efficient application of learning methods in practice is fast extraction of features from raw data. As an example, I am currently working on methods for automatic analysis of malware, where the behavior of a malware binary is represented in a textual report and mapped to a vector space using frequencies of contained substrings (see here). However, thousands of binaries need to be processed and often the generated report files are huge. Consequently, the task of designing analysis methods gets tedious, as one has to wait several minutes just to load data and extract appropriate feature strings.

In quest for a remedy, I have experimented with OpenMP (Open Multi-Processing) and libarchive, where the first is a simple API for multi-processing programming in C and the latter a library for reading and writing of file archives, such as zip, tar and on. On the one hand OpenMP enables loading of data and extraction of features in parallel, whereas on the other hand libarchive allows for storing the data efficiently in compressed archives in favor of directories.





The figure shows some preliminary run-time measurements for extraction of feature vectors from malware reports. The application of multi-processing clearly accelerates the feature extraction, independent of the applied data format. For example, when reading from a directory the performance is doubled if two threads are used and enables processing up to 100 files per second (note this experiments was run on a dual-core machine). Surprisingly, the extraction performance is also high when using compressed archives. There is almost no difference between feature extraction from a zip/gz archive and a plain directory. Moreover, the zip/gz archive consumes only 5% of the original space and considerably reduces the amount of required storage. That's impressive. If you are dealing with loading and processing thousands of files, these tuning hacks might be an interesting option.

It's finally done.

| Monday, May 4, 2009

After a long period of hard and boring work, I finally submitted my Ph.D. thesis to the computer science faculty at Berlin Institute of Technology (TU Berlin). The thesis is entitled "Machine Learning for Application-Layer Intrusion Detection" and refereed by Klaus-Robert Müller, John McHugh and Pavel Laskov.

In my thesis, I tackle the problem of detecting unknown and novel attacks in the application layer of network communication and present a machine learning framework for intrusion detection. In particular, I propose a generic technique for embedding of network payloads in vector spaces such that features extracted from the payloads are accessible to statistical and geometric analysis. Efficient learning in these high-dimensional vector spaces is realized using the concept of kernel functions defined over network payload data. Based on these functions, I derive methods for anomaly detection suitable for identification of unknown attacks, where normality of network data is modeled using geometric concepts such as hyperspheres and neighborhoods.

The framework is empirically evaluated using 10 days of HTTP and FTP network traffic and over 100 real attacks unknown to the applied learning methods. A prototype of the framework outperforms related methods from Kruegel et al. (2002), Wang et al. (2006) and Ingham et al. (2007), where it identifies 80–97% unknown attacks with less than 0.002% false positives. Moreover, reasonable throughput rates between 20–60 Mbit/s are attained, though no special hardware acceleration is yet utilized.

As the thesis is under review, I will not provide an online version now. However, I am going to present some interesting results on visualization of detected attack payloads in later posts. Stay tuned.

Fun and Pain with ZFS.

| Tuesday, April 7, 2009

Recently, I found the time to experiment with ZFS – Sun's next-generation file system – using an external USB drive. In particular, I have been playing with ZFS to test whether it alleviates typical tasks of machine learning research, such as loading directories containing ten thousand of files or repeating experimental runs with large data chunks. As I am not running a Solaris system, I had to install the userland utilities and kernel modules for OSX (Leopard) available at Macforge. Note that Leopard natively supports read-only access to ZFS pools but lacks write functionality.

Apart from great read access time, the first interesting issue I noticed is ZFS's ability to create hierarchical file systems on-the-fly. Instead of dumping all contents into a single volume, the hierarchical layout enables fine-grained control of different experimental data sets and allows for assigning quota and options individually per data. For example, the ability to easily split data comes handy, if one enables the transparent compression in ZFS. This snippet of commands creates two file systems named tank/dataset1 and tank/dataset2 where uses automatic compression.

% zfs create tank/dataset1
% zfs create tank/dataset2
% zfs set compress=on tank/dataset1
% zfs set compress=off tank/dataset2
Compression might not be desired when running certain experiments. Thus, it can be disabled and enabled per file system, such that archived data is stored effectively while the current workbench is accessible with full processing power. A nice feature for working with experimental data. The compression ratio of each file system can be queried using the following command.
% zfs get compressratio tank
NAME PROPERTY VALUE SOURCE
tank/dataset1 compressratio 3.14x -
tank/dataset2 compressratio 1.00x -
Another interesting issue is ZFS's ability to store snapshots of file systems. Initially, a snapshot does not consume any memory as only the differences to the original version are stored. If one is working with multiple copies of the same data, say one version described in a publication and a refined variant, snapshots are a great tool, as they allow one to quickly jump back and forth between different versions of data. One can access the content of each snapshot using the directory .zfs in the root of the considered file system. Here's an example how a snapshot is created for a given data set.
% zfs snapshot tank/dataset1@paper
As it can see from a call to zfs list there is no memory associated with the snapshot directly after creation and thus no storage is wasted.
% zfs list
tank/dataset1 2.38G 222G 2.38G /Volumes/tank/dataset1
tank/dataset1@paper 0 - 2.38G -
This feature is really nifty if one needs to "freeze" the state of experimental data if a paper is accepted for publication while still continuing to work with the data set.

Unfortunately, all my enthusiasm and great plans to work with ZFS have been eliminated by the instability of the current Leopard driver. The driver does not really handle USB devices. If the device is accidentally removed or the computer falls asleep, the ZFS module simply crashes the kernel. I even managed to corrupt the ZFS partition such that any call to zpool status issues a kernel panic. Game over.

Call for Papers: DIMVA 2009.

| Sunday, January 25, 2009

I am a member of the program committee of this year's conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA) in Milan, Italy. The conference invites paper submissions from the domain of computer security with focus on intrusion detection and malware research.

  • Deadline for paper submission: February 6, 2009
  • Notification of acceptance or rejection: March 30, 2009
  • Final paper camera ready copy: April 10, 2009
  • Conference dates: June/July, 2009
Contributions can be submitted as full papers (limited to 20 pages) or short papers (limited to 10 pages). A detailed "call for papers" is available here. I am looking forward to interesting contributions and an awesome DIMVA conference—as in the last five years.

Unpleasant Facts about Data Clustering.

| Thursday, December 18, 2008

Data clustering is a popular technique of data mining and machine learning. The objective is to automatically partition a set of given objects into "meaningful" groups. Clustering is intuitive and fits a variety of problem settings, for example intrusion detection and malware categorization in computer security. However, application of clustering methods is laborious and error-prone in practice. Here are my unpleasant facts of clustering:

  • Generic clustering is NP-hard. Formally, clustering aims at partitioning data such that a predefined quality criterion is maximized. Unfortunately, there is no generic way for determining an optimal solution in this setting except for testing all possible combinations. All practical clustering methods, such as k-means and linkage clustering, limit the search space by discarding partitionings and thus provide only an approximation to the best clustering (a local optimum). If you cluster data, you will likely obtain an approximate solution.

  • Hierarchy is no clustering. Linkage clustering is a form of hierarchical clustering, where the data is first mapped to a hierarchical representation and then partitioned into clusters. In practice, people often consider the hierarchical representation of data as the final clustering result. However, a hierarchy encodes an exponential number of possible partitionings and the more involved step is to flatten the hierarchy into a clustering. As a negative example consider the Wikipedia entry on hierarchical clustering which almost solely focuses on constructing hierarchies—omitting details on the subsequent clustering procedure.

  • Statistical consistency unclear. An important term from statistical learning theory is consistency. Shortly put a learning algorithm is consistent if it converges to the true solution the more training data is provided. Surprisingly, little is known about the consistency of clustering. For instance, most clustering methods aim at partitioning provided data but not the underlying data distributions. That is, if you provide more data to a clustering, there is no guarantee that the solution improves (see the work of Luxburg [1, 2]).

  • Model selection vs. Clustering. Clustering is often controlled using a model parameter that determines the granularity of the partitioning. This parameter can be the number of clusters as for k-means but may also take the form of a numeric quantity as in linkage clustering. Clearly, this parameter needs to be adapted prior to application. However, model selection contradicts with the purpose of clustering. On the one hand, to validate the parameter on given data a reference partitioning needs to be available, thus no clustering is required in this case. On the other hand on unlabeled data the parameter can never be validated directly. In practice, one resorts to model selection on reference data and application of the best model to unknown data. Consequently, this setting requires a representative reference partitioning obtained without clustering.
Besides these facts, I like clustering methods and have been successfully working with several of them. Nevertheless, special care needs to be taken when running experiments to achieve sound results—the application of clustering is more involved as it seems at a first glance.

Incorporation of Application Layer Protocol Syntax into Anomaly Detection.

| Wednesday, December 17, 2008

The majority of current intrusion detection, anti-virus and anti-malware products builds on the concept of misuse detection, that is malicious activity is identified using rules and signatures of known misuse patterns. Consequently, much effort has been devoted to defining expressive and discriminative features for construction of misuse rules. In the domain of network security such features are often derived from the syntax of application layer protocols, for example to answer the questions "who did what to which data when?"

In contrast to misuse detection, a second strain of intrusion detection research investigates methods for anomaly detection, that is malicious activity is identified in terms of unusual and abnormal events. Surprisingly, the use of features from protocol syntax in this setting has been considered in only few research, for example Kruegel et al. and Ingham et al. Access to syntactical features is clearly beneficial for anomaly detection, and hence some of my colleagues (Patrick and Christian) have been working on extending previous approaches to incorporate protocol syntax into a generic anomaly detection framework.

Recent results of this work are presented at the International Conference on Information Systems Security. In particular, we introduce new features for network intrusion detection, which combine sequential features (such as byte frequencies and n-grams) with syntactical tokens. Here is the abstract of our contribution:

The syntax of application layer protocols carries valuable information for network intrusion detection. Hence, the majority of modern IDS perform some form of protocol analysis to refine their signatures with application layer context. Protocol analysis, however, has been mainly used for misuse detection, which limits its application for the detection of unknown and novel attacks. In this contribution we address the issue of incorporating application layer context into anomaly-based intrusion detection. We extend a payload-based anomaly detection method by incorporating structural information obtained from a protocol analyzer. The basis for our extension is computation of similarity between attributed tokens derived from a protocol grammar. The enhanced anomaly detection method is evaluated in experiments on detection of web attacks, yielding an improvement of detection accuracy of 49%. While byte-level anomaly detection is sufficient for detection of buffer overflow attacks, identification of recent attacks such as SQL and PHP code injection strongly depends on the availability of application layer context.

Incorporation of Application Layer Protocol Syntax into Anomaly Detection. Patrick Düssel, Christian Gehl, Pavel Laskov and Konrad Rieck. Proc. of International Conference on Information Systems Security (ICISS), December 2008.