On This Page

Applied Data Science and Machine Learning for Cyber Security

GTK Cyber | August 4-5 & August 6-7



Overview

This interactive course will teach security professionals how to use data science techniques to quickly manipulate and analyze network and security data and ultimately uncover valuable insights from this data. The course will cover the entire data science process from data preparation, feature engineering and selection, exploratory data analysis, data visualization, machine learning, model evaluation and optimization and finally, implementing at scale—all with a focus on security related problems.

Participants will learn how to read in data in a variety of common formats then write scripts to analyze and visualize that data. A non-exhaustive list of what will be covered include:

  • Writing scripts to efficiently read and manipulate CSV, XML, and JSON files
  • Quickly and efficiently parsing executables, log files, pcap and extracting * artifacts from them
  • Making API calls to merge datasets
  • Use the Pandas library to quickly manipulate tabular data
  • Effectively visualizing data using Python
  • Preprocessing raw security data for machine learning and feature engineering
  • Building, applying and evaluating machine learning algorithms to identify potential threats
  • Automating the process of tuning and optimizing machine learning models
  • Hunting anomalous indicators of compromise and reducing false positives
  • Use supervised learning algorithms such as Random Forests, Naive Bayes, K-Nearest Neighbors (K-NN) and Support Vector Machines (SVM) to classify malicious URLs and identify SQL Injection
  • Apply unsupervised learning algorithms such as K-Means Clustering to detect anomalous behavior

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark (PySpark), Apache Drill, and GPU accelerated parallel computing frameworks and demonstrate how to apply these techniques to extremely large datasets.

Who Should Take this Course

Anyone who wishes to incorporate automated data analysis, machine learning and data science into their work.

Student Requirements

Students will need to have an understanding of Python.

What Students Should Bring

Students should bring a laptop with either:
Virtualbox (or VMWare) installed, 6GB of RAM and 10GB of storage.
Anaconda and IPython installed.

We strongly recommend using the virtual machine we will provide as it will give the best student experience.

What Students Will Be Provided With

A preconfigured virtual machine (VM) containing all the software needed for the class. The VM will also contain:
  • All course slides, notebooks, reference sheets and handouts. documentation
  • Skeleton code examples for in-class exercises

Students will also be provided with access to our website which will have additional exercises.

Trainers

Mr. Charles Givre recently joined Deutsche Bank as a lead data scientist in the Chief Information Security Office. Prior to joining Deutsche Bank, Mr. Givre worked as a Senior Lead Data Scientist for Booz Allen Hamilton for the last seven years where he works in the intersection of cyber security and data science. At Booz Allen, Mr. Givre worked on one of Booz Allen's largest analytic programs where he led data science efforts and worked to expand the role of data science in the program. Mr. Givre is passionate about teaching others data science and analytic skills and has taught data science classes all over the world at conferences, universities and for clients. Mr. Givre taught data science classes at BlackHat, the O'Reilly Security Conference, the Center for Research in Applied Cryptography and Cyber Security at Bar Ilan University. He is a sought-after speaker and has delivered presentations at major industry conferences such as Strata-Hadoop World, Open Data Science Conference and others. One of Mr. Givre's research interests is increasing the productivity of data science and analytic teams, and towards that end, he has been working extensively to promote the use of Apache Drill in security applications and is a committer for the Drill project. Mr. Givre teaches online classes for O'Reilly about Drill and Security Data Science and is a coauthor for the forthcoming O'Reilly book about Apache Drill. Prior to joining Booz Allen, Mr. Givre, worked as a counterterrorism analyst at the Central Intelligence Agency for five years. Mr. Givre holds a Masters Degree in Middle Eastern Studies from Brandeis University, as well as a Bachelors of Science in Computer Science and a Bachelor's of Music both from the University of Arizona. He speaks French reasonably well, plays trombone, lives in Baltimore with his family and in his non-existant spare time, is restoring a classic British sports car. Mr. Givre blogs at thedataist.com and tweets @cgivre.

Austin Taylor (www.austintaylor.io) has an extensive background in Defensive and Offensive Cyber Operations and has performed incident response for some of the world's top Fortune companies. His expertise includes penetration testing, data science, threat hunting and User and Entity Behavioral Analytics (UEBA). He has taught data science courses for the last 3 years and is the author of "How to Build a World Class Monitoring System for Home, Small Office, or Enterprise Networks". In his off time, he teaches programming and conducts training at conferences. He currently serves as a Cyber Warfare Operator for the United States Air Force and is a Chief Security Research Engineer at IronNet Cybersecurity. Austin holds multiple industry certifications including CISSP, GMON, GCCC, GXPN, GCIA, GCIH, GCPM, GSEC, GPEN, CEH, VCP, CCNA:Security

Video Preview (Training Description Above - Top of Page)