Home » Software Development » Fighting Advanced Persistent Security Threats with Anomaly Detection: Sometimes More is More

About Ellen Friedman

Ellen Friedman
She is a consultant and commentator on big data topics. Active in open source, she is committer for Apache Drill and Apache Mahout projects and co-author of many books on working with data in the Hadoop ecosystem. She has a PhD in biochemistry, years of experience as a research scientist and has written about a wide range of technical topics including biology, oceanography and the genetics of learning and memory.

Fighting Advanced Persistent Security Threats with Anomaly Detection: Sometimes More is More

Does this sound disturbing?  You try to reach a particular website only to find the site is down. But it’s not that simple. You try another site – also not reachable. And another and another… You look to social media for in-the-moment reports about what’s happening and while you are reading about a huge swath of the country under cyber attack, that social media site goes out, too.

This was what many people in North America experienced on 21 October 2016 when a widespread DDoS (distributed denial of service) attack took place in several waves, starting with the east coast of the U.S. and spreading to hit west coast and some European sites.  It’s worth taking a look at the pattern behind this attack in order to understand what actions can be taken against future attacks. Because the attacks will keep coming…

Here’s what happened in October 2016:  At 7:00 am ET, the first wave of the attack started, targeting as its victim Dyn, whose DNS (Domain Name System) infrastructure serves as the mechanism to handle lookup requests for many sites on the internet.  That’s the first key to this attack: by sending millions of fake lookup requests, the DDoS event overwhelmed Dyn and thus knocked out service for many popular. The effect is amplified, too, because as sites became unreachable, automatic retries added to the overwhelming traffic. At its peak, the October attack on Dyn had traffic of 1.2 Terabits/sec, far in excess of any DDoS attack before that.

Kyle York, Chief Strategy Officer for Dyn, reported in their blog that this first wave attack was mitigated within 2 hours, but then the next wave began around noon ET, and this time it hit sites beyond the east coast. I’m based near San Jose, and just as I was checking news of the first attack via Twitter, that site became unreachable as the second attack swept across the country. Dyn reports that it was stopped within an hour, and that a third round of assaults was mitigated before customers were affected. In the end a large number of websites were affected, including sites for the New York Times, Wall Street Journal, Spotify, Starbucks, PayPal, Wired and even The Guardian in the UK.

What are the lessons to be learned from this DDoS attack? And what can be done to combat this threat?

Your refrigerator may attack you. This odd statement is not even an exaggeration. One of the things besides the scale of traffic that made the October Dyn attack stand out was the source of the massive malicious traffic – it came not from compromised laptops or servers but from an army of a 100,000 compromised sources that included IoT devices such as those found in your home – webcams, DVRs, routers.  In this case many IoT devices were taken over by a botnet, a network of viral bot programs, known as Mirai.

The growing number of connected “smart” devices in homes, offices and public places raises concern about how well security for these devices is being handled. This is one reasons that some companies, including MapR, are building technologies that can extend security all the way from sophisticated IoT devices, such as those used in the transportation industry, all the way through the data pipeline to central processing data centers, but that’s another story.

Compromised sites are separate from the targeted victim Here’s another key aspect of the pattern of the DDoS attack: the compromised sites infiltrated by the malicious bots were not the same as the site being targeted and victimized, in this case Dyn and secondarily the companies whose websites it services.

In advanced persistent security attacks, the compromised sites (which could be laptops, IoT devices or even major websites) are separate from the victims being targeted.

The bad guys land, penetrate a site or device and then go silent. The compromised sites usually are unaware that anything is amiss. At a later time, the sleeping botnet awakens on signal and pivots to launch an attack elsewhere. That poses a new problem: The sites with most to lose (the targets) are not the ones whose security was insufficient. Who, then, carries the responsibility to protect against the intruders?

The October 2016 attack against Dyn was unusual in

  • its strength (1.2 Terabits/sec)
  • the fact that the target victim of the initial pivot (Dyn) was a site with far reaching impact
  • the fact that a huge number of IoT devices were involved as the attack vector

But this attack shares a pattern of behavior with other large attacks, including the earlier and well-studied attack known as Ababil. What they have in common is the separation of compromised site and targeted victim.

Attack of the Brobots: 2012 Ababil assault on banking sites.

Previously the “biggest attack in history” based on volume of traffic during a DDoS attack was set in 2012 during an assault on banking sites carried out by state-level assailants. Even so, the levels of DDoS traffic were around 75 Gigabytes/sec.  As with the October 2016 attack on Dyn, in Ababil the bad guys landed on a collection of sites, pivoted (after a delay) and attacked separate targets, in this case a number of large financial sites that included Bank of America, Wells Fargo and the New York Stock Exchange.

This attack took months of planning, and based on evidence that includes its level of sophistication, it is believed to have been carried out by state level assailants.  In contrast to the October 2016 DDoS, the compromised sites in the Ababil assault were high traffic application servers. Initial compromised servers became the command and control centers (C & C) that would later signal a collection of Brobots (sites compromised by the particular malware used in this case) to launch the attack against the targeted financial company sites. Since the compromised sites were high end websites themselves, just a few could direct massive volumes of attacks at the victims.

An important strategy in the Ababil case is that poor hygiene made many sites vulnerable to penetration and conversion into a Brobot. Attackers used Google to search for the default welcome pages for Joomla, a page that should have been deleted as soon as the software was properly installed.

These DDoS attacks continue. Another round of financial sites in Europe were targeted in January 2017 for example. The goal for DDoS attacks differ. It may be revenge but often it’s a way to extort funds from the targeted victims. Essentially these sites are held hostage through a flood of malicious traffic and then ransom is demanded.

Who is most likely to be targeted by massive DDoS assaults?  High value sites are an obvious target as victim: they could pay enough to make the attack worthwhile. But also keep in mind that sites with capacity for very high volume traffic may be sought as potential botnet sites.

High-value sites that share these targeted characteristics include banks and other financial institutions, telecommunications companies, utilities, retail and web100 companies and all their vendors: a very large group.

What can you do to protect yourself?

As you become more sophisticated in using big data to get business value, so too do attackers who seek to hack into sites for nefarious purposes. How, then, can you protect your business and your customers?

Here’s where big data, machine learning and vigilant behavior come to the rescue, as described by Ted Dunning, Chief Application Architect at MapR, in a 2016 talk “Detecting Persistent Threats Using Sequence Statistics: at Hadoop Summit Dublin. The good news is that despite the sophistication of the attackers and the attack, big data affords you the chance to stay ahead of them if you are prepared, but let’s first look at some simple steps that can reduce risk.

First of all, exercise good cyber hygiene, whether you are a business operating a website, a manufacturer supplying “smart” devices or a home owner with smart appliances.  In the case of IoT, for example, manufacturers should require adequate passwords to be set up in order to activate devices. Until that becomes common practice, individuals should be careful to immediately change default passwords on home devices.

Secondly in the case of separation of compromised site (the botnet) and the targeted victims, companies need to cooperate with each other and with law enforcement agents to act quickly and shut down botnets when an attack is underway. Even if you’re not the targeted victim, if your site is involved you should shoulder some responsibility and act to protect other sites. This is not only the right thing to do, it’s the smart thing: when major sites are overwhelmed and held hostage by malicious attacks, we all suffer.

Big data approaches that can reduce risk: Size and speed give you an advantage to protect yourself against loss from cyber attacks. An essential way to protect your business and reduce the risk of loss from security attacks on your website is through early and effective anomaly detection. When you can quickly recognize anomalous behavior, you can identify a cyber assault and hopefully thwart it, thus reducing risk of loss.

An important thing to keep in mind is that the tactics the bad guys employ in cyber attacks keep changing – that’s a daunting observation. This means there is no ultimate solution to protect you against current and future threats. Instead, your best defense is to learn a style of counter measures and vigilance, and that style is based in the clever use of adaptive big data analytics (anomaly detection through machine learning) to discover new patterns of suspicious behavior as they emerge.

In his presentation, Dunning described several approaches that use large scale sequence statistics. What this means is that effective machine learning models can make use of the fact that event sequences provide clues.  These event sequences include things like header types, ordering of requests, source and destination in IP address access requests and TLS options, values and algorithms. The specific sequence for events with normal behavior can be identified using machine learning models. When criminals attack with fake traffic, phishing attacks or other cyber assaults, the event sequence will be different and not in logical ways that the criminals could predict and thus avoid.  Armed with big data and appropriate statistical models, the subtle “fingerprints” left by cyber criminals can be identified and, in a system set up for speed, criminal assaults can be thwarted before major damage is done.

An example from a MapR banking client was described in Dunning’s talk. In this case, the clever security expert at the bank collected a huge amount of event sequence data, even though he could not know at the time which events would provide clues to reveal future attacks. As it turned out, an anomalous sequence pattern for details in headers was a valuable clue to identify and shut down an attack.

In the short book Sharing Big Data Safely: Managing Data Security (Dunning and Friedman, published by O’Reilly in 2015) we describe in Chapter 6 another example in which a MapR customer was able to use large amounts of financial transaction data coupled with anomaly detection and models employing synthetic data to find the source of credit card fraud through a compromised merchant. Although not an attack on a website, this example shows the power of big data analytics to track down criminals even when there is a separation in time and location between the compromising event and the victims being attacked.


Advanced security attacks will continue, and each one will be different. The good news is that big data approaches give you a way to keep changing your defensive stance as fast as the bad guys come up with new ways to attack.

Big data is one of your best defenses, especially against high end attackers. That is why, in this case, more is more.

Additional Resources:

(0 rating, 0 votes)
You need to be a registered member to rate this.
Start the discussion Views Tweet it!
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments