CS259D: Data Mining for Cyber Security

Course information

Time

TTh 4:15pm - 5:30pm

Location

Herrin T175 (click for map)

Staff

Instructor	Bahman Bahmani	bahman@cs	Office Hours: 5:45 - 6:45pm outside the classroom
TA	Dima Brezhnev	brezhnev@cs	Office Hours: 1:00 - 3:00pm Fridays @ Huang open area on bottom floor outside of ICME + extra office hours before assignment due dates

Piazza

Link

Description The massive increase in the rate of novel cyber attacks has made data-mining-based techniques a critical component in detecting security threats. The course covers various applications of data mining in computer and network security. Topics include: Overview of the state of information security; malware detection; network and host intrusion detection; web, email, and social network security; authentication and authorization anomaly detection; alert correlation; and potential issues such as privacy issues and adversarial machine learning. Prerequisites: Data mining / machine learning at the level of CS 246 or CS 229; familiarity with computer systems and networks at least at the level of CS 110; CS 140 and CS 144 strongly recommended; CS 155 recommended but not required.

Lectures

Introduction: Overview of information security, current security landscape, the case for security data mining [pdf]
Botnets: Botnet topologies, botnet detection using NetFlow analysis [pdf]
Botnets Cont'd, Insider Threats: Botnet detection using DNS analysis, introduction to insider threats, masquerader detection strategies [pdf]
Readings:
- A Survey of Insider Attack Detection Research Skim before class and use the references for more information.
- Insider IT Sabotage across US Critical Infrastructure Appendix B
Behavioral Biometrics: Active authentication using behavioral and cognitive biometrics [pdf]
Reading: Ch 4 + Ch 6 of "Behavioral Biometrics, A Remote Access Approach" by Kenneth Revett (2008).
Behavioral Biometrics Cont'd: Mouse dynamics analysis for active authentication [pdf]
Security at Wells Fargo: Guest speaker Avi Avivi, VP Enterprise Information Security Architecture at Wells Fargo [pdf]
Behavioral Biometrics Cont'd: Mouse dynamics analysis cont'd, touch and swipe pattern analysis for mobile active authentication [pdf]
Web Security: Web threat detection via web server log analysis [pdf]
Security at Union Bank: Guest speaker Gary Lorenz, Chief Information Security Officer (CISO) and Managing Director at MUFG Union Bank
Multi-Classifier Systems, Adversarial Machine-Learning: Overview of multi-classifier systems (MCS), advantages of MCS in security analytics, security of machine learning [pdf]
Security Data Mining at Google: Guest speaker Massimiliano Poletto, head of Google Security Monitoring Tools group [pdf]
Web Security Cont'd, Deep Packet Inspection: Alert aggregation for web security, packet payload modeling for network intrusion detection [pdf]
Machine Learning for Security: Challenges in applying machine learning (ML) to security, guidelines for applying ML to security [pdf]
Polymorphism: Polymorphic blending attacks, infeasibility of modeling polymorphic attacks [pdf]
Deep Packet Inspection Cont'd: One-class multi-classifier systems, one-class MCS for packet payload modeling and network intrusion detection [pdf]
Note to students: Please also refer to class notes for mathemtical derivations of one-class MCS fusion rules
Phishing Detection: Phishing email detection, phishing website detection [pdf]
Industry Perspectives: Q&A with guest speaker Michael Fey, EVP and CTO of Intel Security Group (aka McAfee)
Student Presentations: [pdf]
Student Presentations Cont'd: [pdf]
Automatic Alert Correlation, Final Thoughts: Building attack scenarios from individual alerts, course review, current and future trends in security [pdf]

Homework

First homework: Google Doc. It is due on 10/21. Submission instructions will be posted closer to the due date.

Second homework: Google Doc. It is due on 11/5 night.

Third homework: Google Doc. It is due on Friday before Thanksgiving break. Note that this assignment requires you to sign up before 10/14 for a presentation.

Course Review/Fourth homework: Google Doc. Due Friday 12/12 noon. Early submissions are appreciated.

Topics

Introduction: Introduction to Information Security, Introduction to Data Mining for Information Security
Malware Detection: Obfuscation, Polymorphism, Payloadbased detection of worms, Botnet detection/takedown
Network Intrusion Detection: Signature-based solutions (Snort, etc), Data-mining-based solutions (supervised and unsupervised), Deep packet inspection
Host Intrusion Detection: Analysis of shell command sequences, system call sequences, and audit trails, Masquerader/Impersonator/Insider threat detection
Web Security: Anomaly detection of web-based attacks using web server logs, Anomaly detection in web proxy logs
Email: Spam detection, Phishing detection
Social network security: Detecting compromised accounts, detecting social network spam
Authentication: Anomaly detection of Single SignOn (Kerberos, Active Directory), Detecting Pass-the-Hash and Pass-the-Ticket attacks
Automated correlation: Attack trees, Building attack scenarios from individual alerts
Issues: Privacy issues, Adversarial machine learning (use of machine learning by attackers, how to make ML algorithms robust/secure against adversaries)
Other potential topics: Fraud detection, IoT/Infrastructure security, Mobile/Wireless security

Requirements

There will be 4 homework assignments. Students will design and implement data mining algorithms for various security applications taught in class. There will be a significant programming component in each assignment; assignments will also have reading components (mostly research literature) to give initial pointers to students about the problems in the programming component. Assignments will be chosen from a subset of the following:

Web attack detection
User profiling for authentication and authorization
Network profiling and intrusion detection
Botnet detection
Host-based insider threat detection
Deep packet inspection
Web proxy log analysis
Algorithmic alert correlation

CS 259D Data Mining for Cyber Security Autumn 2014

Course information

Lectures

Homework

Recommended Readings

Topics

Requirements