My opinions and reactions on the class DPI project
When we first started the class DPI project, the first thing that came into my mind was how easy it would be. I mean, how difficult would it be to open packets, inspect their contents, and classify them accordingly. You could say it would be the equivalent of your friendly postman opening your mail, and classifying whether it contained something important, a postcard, some cash, or perhaps spam and even anthrax. And we're not looking for passwords, credit card information or information for the spooks. No, we are more benevolent than that.
Our purpose would be to classify applications running on the network properly. If we're going to monitor our networks, we must have a complete picture of the applications that our users are running on them. While most of these applications are "visible" and can easily be blocked. However, many applications running on the Internet have acquired the ability to bypass firewalls and proxies. Because of many corporate, academic, technical and what-not policies that have governed networks for the past few years, have pushed applications to use proxies or encrypt their communications in order to bypass the usual roadblocks that network administrators have put in place over our networks today.
The wisdom of such blocks have been in serious question, both on the technical and user levels. However, the reality is that these blocks are here to stay, and the target applications of such blocks have adapted to the current Internet landscape. A great example of such a versatile program would be Skype.
The purpose of our class project was to detect peer-to-peer traffic that have managed to pass through the roadblocks that the university network administrators have put in place.
While we did manage to get a sample of the network traces, we have yet to detect any peer-to-peer activity in the university network. So far, the university network administrators have appeared to succeed in their "quest" to block all kinds of peer-to-peer traffic.
We also used some machine learning techniques on the traces, however, I think that we have largely failed in that because we don't have any training data to use... because there have been no peer-to-peer traffic detected. We need to get data which we positively know has peer-to-peer traffic. If we can't detect it, then we should run some applications and actively look for holes in the university network. Once we "detect" our own traces, put them into the machine learning tool and use it as training data to detect the peer-to-peer traffic that do not belong to us.
The other technique that the class investigated, which is actually reading the packet contents, is either a hit-or-miss thing. We can argue that reading the first few bytes of the data can give us the name of the actual application, however, once this traffic is encrypted, all bets are off. I believe this technique will only be useful in the near-to-medium term, and will work only on simple applications that have not acquired the variety of users who need to use special methods to bypass proxies and firewalls.
We are not yet there, but we have learned the "what not to do in DPI". This may sound like an Edisonian way of thinking, but I believe that as we continue to refine our techniques and code, we will be able to achieve a way to detect peer-to-peer traffic without reading the payload.