The Rationale for Securing Big Data

Chase HooleySeptember 16th, 2016Last Updated: September 16th, 2016

0 75 4 minutes read

This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good read whether you’re a white/grey/black hat, cyber superhero, or even if you’re not a security expert at all.

In his first chapter, Davi discusses the rationale for big data security, told as only he can. Here is an excerpt:

The rationale for security in this emerging world of 3V engines (Volume, Velocity, Variety) is twofold. On the one hand, security is improved by running on 3V (you can’t predict what you don’t know) and on the other hand, security has to protect 3V in order to ensure trust in these engines. Better security engines will result from 3V, assuming you can trust the 3V engines. Few things speak to this situation of faster/better risk knowledge from safe automation than the Grover Shoe Factory Disaster of 1905.

On the left you see the giant factory, almost an entire city block, before the disaster. On the right you see the factory and neighboring buildings across the street turned into nothing more than rubble and ashes.

The background to this story comes from another automation technology rush. Around 1890 there were 100,000 boilers installed as Americans could not wait to deploy steam engine technology throughout the country. During this great boom, in the years 1880 to 1890, over 2,000 boilers were known to have caused serious disasters. Despite decades of death and destruction through the late 1800s, the Grover Shoe Factory still had a catastrophic explosion in 1905 with cascading failures that leveled the entire building, burning it to the ground with their workers trapped inside.

This example helps illustrate why trusted 3V engines are as important, if not more so, as the performance benefits of a 3V engine.

Finding Threats Faster Versus Trusting a Tool

It really comes down to figuring out how to use big data to improve the quality of security itself. Many people are actively working on better security paradigms and tools based on the availability of more data. In fact, if you bought a recent security product, there’s a good chance that it is running on a big data platform like he MapR Converged Data Platform. Indeed, according to Davi, “the collection and analysis of as much data as possible is justified by the need to more quickly address real threats and vulnerabilities.”

MapR customers are already putting this into practice. From Terbium Labs’ ingenious “digital fingerprint database” for customer data and content, to RiskIQ’s brilliant use of DNS, whois, and other metadata in their external threat management platform, new era security service providers are using large data (and metadata) feeds to counter advanced threats and bad actors.

Threat intelligence feeds are a good example of the new security rationale. They use the MapR platform as the trusted 3V engine and then employ machine learning and advanced analytics to sift through high volume data feeds or crawl the (dark) web. If these feeds have some subtle or disguised indicators of compromise, it takes a big data engine and advanced analytics to detect them given today’s data volumes.

Davi’s point here is that success is dependent both on effective algorithms and the performance, reliability and security of the underlying data platform.

Changing the Entire Architecture of Business and IT

Any industry can quickly evolve given new technology and better data. Davi uses agriculture as a model:

Agriculture is an excellent example of how an industry can evolve with new technology. Replace the oxen with a tractor, and look how much more grain you have in the silos. Now consolidate silos with automobiles and elevators and measure again. Eventually we are reaching a world where every minute piece of data about inputs and outputs from a field could help improve the yield for the farmer.

Fly a drone over orchards and collect thermal imagery that predicts crop yields or the need for water, fertilizer, pesticides; these inexpensive birds-eye views and collection systems are very attractive because they can significantly increase knowledge. Did the crop dusting work? Is one fertilizer more effective at less cost? Will almonds survive the drought? Answers to the myriad of these business questions are increasingly being asked of big data systems.

Today the traditional engines of agriculture (diesel-powered tractors) are being set up to monitor data constantly and provide feedback to both growers and their suppliers. In this context, there is so much money on the line, with entire markets depending on accurate prediction; everyone has to trust the data environment is safe against compromise or tampering.

It doesn’t matter how good an algorithm is if the 3V engine lacks a method for ensuring data integrity. It’s important to know how to defend against attackers intend to poison, manipulate, or attack your 3V engine, and also to know that that 3V engine has the mechanisms in place to earn your trust. This is something that MapR developers have put a lot of thought and effort into with the MapR Converged Data Platform.

Next Time

In the next blog post on the subject, we’ll talk about Securing HeavyD. No, that’s not the story of a hip hop artist’s bodyguard, it is the subject of chapter 2 of Davi’s book: Six Elements of Securing Big Data.

Compliments of MapR.

References and More Information: