In-depth: Security Data Science

  • 19 Aug 2012 12:59 PM | Paul Braxton (Administrator)
    I've defined Security Data Science (SDS) as the application of advanced analytics to activity and access data to uncover unknown risks. In this article I explore this topic in-depth and layout the reasons why we believe Security Data Science is the next evolution in the Information Security and Fraud industry.

    In the last decade analytics has become increasingly important to effective risk mitigation. When the information security and cyber fraud industry first emerged it was focused on implementing technical controls to mitigate common threats such as network intrusion, viruses, worms and vulnerabilities. In the early days a typical information security department may have managed network firewalls, vulnerability assessment and anti-virus tools. Many had deployed Intrusion Detection Systems (IDS) but were struggling to tune out false positives. Finding vulnerabilities and patching them, implementing filtering proxies, discovering unmanaged devices and ensuring security software was in proper working order dominated the day-to-day tasks of the information security professional. Many of these technical controls remain essential in today's environment however they are less effective against new threats.
    Figure 1: Common "control" and "analytic" systems

    Examples of control and analytic systems in information security

    Shifting emphasis to analytic systems

    It wasn't long before Information Security and fraud prevention became more about deploying analytic systems rather than control systems. For the purposes of this article, a control systems implements logical controls in technology environments that mitigate risks. For example, a firewall controls network access between one or more networks mitigating risks from direct attack over the Internet. Anti-virus strips malicious code from files. A proxy breaks the connection between clients and servers in order to filter or block content. Another type of security control is a control system by design such as a demilitarized zone (DMZ).  A DMZ usually employs firewalls to separate network zones that have different security requirements. A financial website not allowing money transfers for an account until there is extended identity verification is an example of a fraud prevention control. While deploying and maintaining control systems will always be an integral part of security and fraud prevention, as technology environments got more complex securing those environments became more difficult. This is when information security professionals began deploying more analytic systems. An analytic system uses data to gain a better understanding of the state of risk mitigating controls and current threats. These systems allow management to focus on the highest risks. The most prominent analytic system is a SIEM system. These systems ingest security relevant event data from multiple sources including security control systems, authentication devices and operating systems. Typically a SIEM correlates multiple event streams to automatically alert for pre-defined conditions. Another prominent analytic system is a security metrics program. Information Security practitioners began to deploy a comprehensive security metrics program to maintain a good understanding of risk and performance. Security metrics measure things such as vulnerability management, health and maintenance of control systems and intrusion attempt statistics. The increased emphasis on security metrics and SIEM over the last decade reflects the shift to a focus on analytics.

    Embedded analytics becomes necessary for control systems to remain effective

    Once the shift was underway security managers demanded more data and more analytics. Even if a modern security department had effective metrics and a SIEM system it became apparent that this wouldn't be enough. Control systems needed to embed analytics within to remain effective controls. For example, anti-virus products added behavioral analysis engines to detect fast morphing malicious code. IPS vendors wrapped data correlation around signature engines to cut down on false positives. Without the addition of embedded analytics control systems would have become obsolete. Control systems continue to evolve embedding more and more analytics within the system. More recently IP and URL reputation has been embedded into many control systems. Data Leakage Protection (DLP) systems which are used to spot sensitive data in files, email and web traffic have added the use of cryptographic hash data to cut down on false positives. Even analytic systems have advanced over the years. SIEMs got a data and analytic facelift by incorporating asset value and actor data to provide more context around security events. There is no doubt that analytics is here to stay and critical to effective security management in the future.

    Overarching changes to technology environments are forcing analytics to the forefront

    In the last few years the arms race between hackers and defenders has escalated. As the workforce gets more technologically savvy, social media, personal tablets and smart phones are added to the vast environment where information can be compromised. The use of Cloud computing has expanded the attack surface even further. Choke points such as firewalls and proxies now only control a small subset of communication channels. Hackers have advanced to developing targeted malware and leveraging massive botnets to carry out reconnoissance and acquisition. Hackers use more surgical techniques to evade detection by legacy correlation engines. Malicious insiders are more aware that they are being monitored and put more effort into hiding their activities. Data volumes have exploded flooding analytic systems with data. This "data glut" stresses metrics frameworks, SIEMs and other analytic systems and they struggle to continue to provide useful information. Analytic systems need to include more processing horsepower and use more advanced techniques just to process data and provide the same value provided in the past.

    Security Data Science: New skills needed to meet the analytical challenges of the future

    In order to meet the challenges of analytics today and into the future, security and fraud departments must put more emphasis on hiring and developing security professionals that have a deep understanding of analytic techniques. Skills such as parsing, normalization, standardization are now a basic prerequisite. This usually requires knowledge of programming languages such as Python, Perl or Java. The new generation of security professionals will need the skills and knowledge necessary to mash up, filter and process data from multiple sources in order to  provide a clear picture of risk. Typically data has to be collected from a variety of source databases including Oracle, Microsoft and mysql so the professional today needs to have basic knowledge of how to connect to those data sources and SQL to extract data. Other data will live in log files that must be parsed with regular expressions in order to extract the meaningful information. Given the volume of data that will be collected, security professionals may also find themselves responsible for Big Data management and may need to set-up their own instances of Hadoop or data warehouse platforms such as GreenPlum or Vertica. Once data is in hand knowledge of data mining, statistical modeling and data visualization become critical to performing analytics that will provide insights needed for effective risk mitigation. 
    Figure 2: Security professionals need to develop analytic skills to remain effective 

    Security professionals need to add data analytic skills

    Figure 2 above lists some the common skills of Information Security professionals. While roles may vary from management to incidents response analytic skills will become an increasingly important asset. Security teams should add analytic skills such as machine learning, data mining and statistical analysis through training and hiring. Another key element is a curiosity to explore the data problem behind the security problem and then creativity to pull in the right data and apply the right analytic techniques that produce valuable insights. This will require a focus on professional development of the analytical mindset to go along with data processing skills.  Increasingly the new generation of security professionals will turn to data analytics to find solutions to security challenges. Security Data Science represents the combined skillset needed to meet these challenges. Data parsing and manipulation, SQL, data visualization, using 'Big Data' tools such as Hadoop / Map Reduce, statistical analysis and machine learning techniques are a few of the skills that will become essential to information security professionals. 

    Developing your security data science skills

    In the future analytic skills will become core skills of security professionals but that doesn't mean every information security professional has to become a statistical genius! Managers need only to develop an understanding how analytics can help with the modern day security challenges and acquire team members or partners that have the analytic know how. When doing major replacements of control systems such as IPS or DLP ensure that there is an emphasis in integrating data into an analytic environment such as your SIEM or Security Data Warehouse. Security professionals in technical roles should seek training in analytic techniques and demand more intelligent solutions from product vendors. Leverage product vendors and open source communities to learn about specific analytic use cases that can be re-used in your environment. In this way you can learn the techniques in a specific environment that builds upon existing security expertise.

    The good news is security professionals have always had an expanding skill set so they will be up to the challenge. Current professionals have already been developing and practicing many of the prerequisite  skills discussed here. Investing in developing advanced analytic skills will pay many dividends in the future as analytic skills will also emerge as a vital skill in life and many other industries. We are here to help! The primary goal of securitydatascience.org is to provide a community where professionals can learn, network with others for continued growth in their own knowledge and skill set.

Association of Security Data Scientist

Powered by Wild Apricot. Try our all-in-one platform for easy membership management