mrtg.png

When having the responsibility of monitoring a network and periodically auditing it to look for places of improvement, there are two major kinds of unauthorized network access that make your job harder: pirated data transfers and infected/zombie machines. In this small segment, I will focus only on part of the pirated data transfers category. Most university networks have some amount of background bittorrent traffic which can be somewhat managed due to the requirement of an open port for optimal efficiency. The bigger problem arises when a staff member or student with privileged network access decides to run a large dumpsite on your edu/Internet2 connection. With many network-intensive research applications running, it is sometimes hard to tell what is legitimate traffic and what is illegal transit. Furthermore, many dumpsite/topsite operators have gotten clever and implemented encryption and command channel bounces on their ftp servers along with access lists and so forth, which has all made it more difficult for the network admin to distinguish legitimate network access from pirated traffic. What I propose is a method of using mathematical analysis to give an indication of dumpsite/topsite presence by sniffing encrypted traffic.

The very first case described here is the existence of a ftp site related mostly to movies. Individuals involved in releasing movies generally release their wares in three sizes: 1 CD (~700MB), 2 CD (2x ~700MB) and DVD-R (~4.3GB). The idea here is to track a network-intensive packet stream and take note of length of each packet as well as it’s arrival time at the packet sniffer. Using this data, one can generate a plot such as a data transferred per time bin and integrate. The idea here is that the data transfer will generally consist of plateaus with small troughs on each side where the “area” under each plateau would represent either 700MB or 4.3GB. If this can be shown in the time-series, then it is an indication that either CDs or DVDs worth of data are being transmitted and would warrant further investigation.

Another example is pirated software. The method of exploitation here is the understanding that software groups often break their releases up into ~15MB archives and hope that the ftp server takes a little bit of time to hop from one file to another (close(), then open(), then read() system calls for example). By taking the above series and taking an FFT, we can try to extract any periodicity in the time series and check to see if the integral of the data between the minima corresponds to about 15MB.

The implementation of either method depends on the existence of a time series which contains both packet size and arrival time, which is something that can be easily obtained from tcpdump. The analysis can be accomplished in Matlab. Of course, high-performance implementations may warrant custom code.

del.icio.us | digg