The adage “bigger is better” often permeates our thinking. But when looking at “big data” and security, it’s best to proceed carefully to ensure this is the case.
The more centralized, consolidated model of big data offers organizations a chance to revisit data security and improve on legacy situations
Big data is increasingly trumpeted as the Holy Grail of business intelligence and data mining. By super-sizing data sets, a landscape is created using a variety of existing and new data assets from across an enterprise to enable rich farming of information. From a security point of view, there are two perspectives to consider with respect to big data:
- Once this information asset is created, how can organizations secure big data?
- What can big data techniques do for enterprise information security?
Let’s consider each of these perspectives in turn, and consider the possibilities for organisations about to embark on a security big data project.
The big data approach is all about consolidation. Big data has arisen as an evolution from distributed data sets, separately managed and interrogated, to centralized data sets, which aggregate and standardize data that may previously have been dispersed in different places and systems. This data is typically security event information from servers, workstations, antivirus, firewalls and IDS/IPS systems. It can also include cyber-physical data from devices such as process controllers or business-level event data from transactional systems, which when combined with infosec data sets could be useful for detecting fraudulent transactions and the like.
The power of this construct is it becomes possible to perform advanced analytics to understand information such as customer behavior, which can be highly valuable for marketing and targeted sales campaigns or better business insights. From a technology perspective, many companies are virtualizing and centralizing their offerings – driven in particular by cloud-based technologies. By applying such standardization and consolidation, data can be combined with the premise that, as data sets increase, so too does the usefulness of the information they contain. Associated big data processing models provide special techniques for handling such data sets.
How do organizations secure big data?
Applying security to the type of data sets handled in big data systems can be challenging. As an attractive target, a big data collection is of keen interest to an attacker due to the vast amount of potentially lucrative information it contains. A primary challenge for security practitioners is to work alongside data analysts to ensure big data sets are constructed and managed with security in mind from the start, this means both at the file or data storage level, and built into the data warehouse. I have seen many examples of data storage and analysis systems where security had to be retro-fitted, and it didn’t function as seamlessly or effectively as it could have. Purely technical issues also need consideration, since techniques like encryption may be quite capable and efficient for files of mega- or tera-byte size, but may be severe performance inhibitors for files that are peta- or exa-bytes big.
But big data security can also be seen as an opportunity. While multiple, distributed systems and data sets may provide smaller, less appealing targets, organizations are often inconsistent in the management and application of security to these data sets. The more centralized, consolidated model of big data offers organizations a chance to revisit data security and improve on legacy situations. From a privacy perspective, big data may also allow organisations to take a holistic view of the data they have on hand and ensure the collection, processing and archiving/destruction of this information is managed according to evolving privacy directives and regulations.
What can big data techniques do for security?
Once an organization creates a unified repository of information, that repository becomes a potential asset for information security. Big data analysis techniques can be performed to find relevant clues to pinpoint possible attacks, fraudulent transactions or other security breaches.
For example, one of the greatest challenges with security analysis is correlation: identifying the relationship between multiple actions or events, and seeing how these combine to implement an attack. The power to process simultaneous correlation searches is an implicit benefit of big data techniques and this is a new, useful and appropriate capability that aides existing security detection processes. The exact data types are not specifically enhanced by big data per se, but in some instances additional fields can be computed (for example: the time since an event of this type was last seen) to provide additional context and input to the big data analysis process.
For organizations to capitalize on big data techniques, a holistic view of the operational and security landscape needs to be created. This involves the incorporation of relevant information at different levels of operation, from transactions (with associated logging and controls/balances) to system usage activities (logins, accesses, connections) to system activity logs (desktops, servers, firewalls, routers and intrusion detection system devices), all of which can be combined to create a comprehensive view of activity. Collecting, inspecting and reacting to these diverse events through a security information and event management (SIEM) platform can be a valuable first step in establishing a big data security repository, but the addition of big data analysis capabilities will enable faster analysis of non-normalized data sets, enabling greater predictive. So while the SIEM remains critical as the actual security database will likely be collected via a SIEM, it will also be interrogated with other tools. There are various options, and most major software companies are presenting big data products, including Microsoft, SAP AG and IBM. EMC Corp. has introduced Greenplum, allowing massively parallel processing and data storage and analytics, while Oracle Corp. has developed a big data appliance. Other niche companies like Splunk offer data analytics and visualization services, too.
Experts' perspectives on big data security
Gartner says big data security will be a struggle
Microsoft is concerned about big data in the cloud
In some SIEMs, not all data is collected (edge filtering) or stored (database purging / selective storage), because the repositories can grow too large for SIEM systems. But it is ineffective and indeed wasteful to have a large volume of data and not be able to retain the full picture. The way forward will be a cooperative venture between SIEM providers (who must store and make available large security data sets) and big data toolkits and engines that can extract and mine the security data in meaningful ways, with the promise of eventually incorporating other types of information such as unstructured video camera data feeds and other context-relevant information. Organizations can also benefit by using programming frameworks like Hadoop, which has the capability to direct parallel processing across multiple nodes, in their analysis activities.
The security analysis challenge is to find the relevant information in a sea of “noise” – for all the relevant data that can be accumulated, it will usually be dwarfed by vast amounts of unnecessary data. Fortunately, there are specialized techniques that can be used for processing vast amounts of big data to find the useful information. For example, with approached involving MapReduce (the programming approach used with data warehouses like Hadoop), whereby problem sets are divided into tasks that can be processed in parallel. The interesting opportunity is to see how the application of these techniques to security events and information (as big data in themselves) can improve security insight and proactive management. Big data techniques can help provide context to information, with a view of activities and patterns against particular norms provided in the large data sets.
With careful application and consideration, there can be significant big data benefits for enterprise information security -- both by applying security to big data sets (with the opportunity to normalize security approaches and mechanisms) and by applying big data analysis techniques to deal with the vast amount of security information and event information that’s collected.
There are challenges, though: how to deal with such large scale data; appropriate ways to structure information, and the very important element of finding the right, highly skilled analysts to lead the organisation into big data – and out again.
Bigger can be better, but careful consideration and planning is required to ensure the big benefits are not overshadowed by introducing big problems that are insecure or unmanageable. Organizations can pave the way toward improving their information security posture with big data by establishing and diversifying the security-related information feeds they collect, thinking beyond traditional security data to include security camera footage, building access data and payment transactions. By starting to integrate these feeds with big data toolsets and processing engines as they evolve, enterprises will benefit by diversifying and yet enriching the quality of security-related information that can be processed and analyzed in a meaningful way using big data analysis techniques.
About the author:
Andrew Hutchison is an information security specialist with T-Systems International in South Africa. An information security practitioner with 20 years of technical and business experience, his technical security work has included secure system development, security protocol design and analysis, and intrusion detection and network security solutions. He has held executive responsibility for information security in a large enterprise, establishing its chief security officer role and initiating an ISO27001 security certification program. As business sponsor for large SIEM rollouts, he has experience in deploying and operating SIEM systems in a managed service provider environment. He is an adjunct professor of computer science at the University of Cape Town in South Africa.
This was first published in September 2012