Machine Learns Twitch

Using unsupervised machine learning techniques to detect and identify Internet trolls on Twitch Plays Pokemon. Several gigabytes of human chat logs and over 37 million data points have been collected for anomaly detection, time series, and context analysis.

See the Poster » Read the Paper » Download the code and dataset »

I am currently interning at Microsoft on the Bing Ads Relevance & Revenue team where I work on fraud detection algorithms for filtering bad ad clicks and conversions.

In the fall, I will be attending Stanford University. I will be a course assistant for the Computer Science Department. I did my undergrad at the University of Texas at Austin where I got a B.S. in Computer Science and a B.B.A. in Finance.

My technical interests lie in probablistic graphical models, distributed in-memory databases, unsupervised feature learning. I am very interested in applying these techniques to large amounts of data in the Internet setting.

Stanford University

Master of Science, Computer Science
August 2014 - June 2016
  • Specialization in Artificial Intelligence
  • Full tuition and living expenses funded by course assistantships

University of Texas at Austin

Bachelor of Science, Computer Science
Bachelor of Business Administration, Finance / BHP
Graduated in May 2014
  • Honors Thesis: A MapReduce Approach to NoSQL RDF Databases
  • Analyst for the Financial Analyst Program ($17 Million MBA Fund)
  • Full tuition sourced from scholarships and research funding
Coursework

Statistical Learning & Data Mining, Graduate Databases, Computational Biology, Optimiation, Stochastic Processes, Business Law, Financial Accounting, Corporate Strategy

Northwestern University

Visiting Student, Economics
May 2012 - August 2012
Coursework
Money & Capital Markets / Money & Banking

Plano East Senior High School

International Baccalaureate Diploma
  • Higher Levels in Mathematics and Computer Science
  • Extended Essay in Computer Science
Coursework
Computer Science I & II, Theory of Knowledge, Calculus, Economics, Spanish I-IV

Awards & Honors

Scholarships

  • ConocoPhillips Computer Science Scholarship (2013-14)
  • Elizabeth Lanham Endowed Presidential Scholarship (2013-14)
  • British Petroleum Corporate Scholarship (2012-13)
  • OmniCure Home Health Healthcare IT Scholarship (2011-12)

Research

  • UTCS Undergraduate Research Funding (2012-14)

Academic

  • University Honors, President's List (2013)
  • University Honors, Dean's List (2011-13)

Microsoft

Program Manager Intern
Bing Ads, Paid Search Ads Applications
Summer 2013, Seattle/Bellevue, WA
  • I developed specs, wrote, and deployed code for parallel ad databases

Cardinal Health

Finance Intern
Medical Segment, Financial Planning & Analysis
Summer 2012, Chicago/Waukegan, IL
  • I consolidated business unit data for the income statement and balance sheet

OmniCure Home Health

Software Engineering Intern
Electronic Medical Records, Front-End
Summer 2011, Dallas/Plano, TX
  • I worked on front-end web applications and automated electronic medical records


  University of Texas at Austin

Undergraduate Research Assistant
Research in Bioinformatics & Semantic Web (RiBS) Lab
September 2012 - Present

Undergraduate Teaching Assistant – Lab Proctor
EDS Financial Trading & Technology Center
September 2012 - May 2014

Undergraduate Teaching Assistant
CS 395T Graduate Cloud Databases
January 2013 - May 2013

Student Technician
McCombs Computer Services (IT Help Desk)
October 2011 - May 2012

Deep Learning

Deep learning is a subfield of machine learning and is concerned with learning at "multiple levels of representation and abstraction." In other words, deep learning is about learning features of features. Specifically, I am interested in how to learn features, unsupervised. Read more on Wikipedia »

Probabilistic Graphical Models

Typically used in machine learning and statistics, graphical models are used to represent many variables and the dependence among them. Markov networks and Bayesian networks are examples of PGMs. Read more on Coursera »

Distributed Graph Databases

Distributed graph databases require different storage schema and have different access patterns compared to relational databases. I explore systems including MapReduce/Hadoop, Hive, Spark, and Amazon EC2, SQL, and SPARQL. Read about MapReduce on Wikipedia »

Publications

Refereed Conference

[1] P. Cudre-Mauroux, I. Enchev, S. Fundatureanu, P. Groth, A. Haque, A. Harth, F. Keppmann, D. Miranker, J. Sequeda, and M. Wylot. "NoSQL Databases for RDF: An Empirical Evaluation." Proceedings of the 12th International Semantic Web Conference (ISWC). LNCS, vol. 8219, pp. 310-325. Springer, 2013. DOI: 10.1007/978-3-642-41338-4_20 Paper » Website »

Theses

[2] A. Haque. "A MapReduce Approach to NoSQL RDF Databases." The University of Texas at Austin, Department of Computer Science. Report# HR-13-13 (honors theses). Dec 2013. 81 pages. Thesis (4 MB) » Presentation (21 MB) »

Presentations, Term Projects, and Other Papers

Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data. May 2014. Poster (3 MB) » Paper »
An Empirical Evaluation of Approximation Algorithms for the Metric Traveling Salesman Problem (2013) Paper »
Selection Coefficients versus Omega for Codon Substitution Rates (2013) Presentation »
HaLoop: Efficient Iterative Data Processing on Large Clusters (2013) Presentation (3 MB) »
Distributed RDF Triple Store Using SPARQL, HBase, and Hive (2012) Paper »
An Analysis and Comparison of Processor Scheduling Techniques (2012) Paper »
A Novel Approach to Cellular Tracking and Surveillance (2010) Paper »

This website's source code is available on Github.
Last Modified: 2014-08-15 09:15 +0000.