Using unsupervised machine learning techniques to detect and identify Internet trolls on Twitch Plays Pokemon. Several gigabytes of human chat logs and over 37 million data points have been collected for anomaly detection, time series, and context analysis.
I am currently interning at Microsoft on the Bing Ads Relevance & Revenue team where I work on fraud detection algorithms for filtering bad ad clicks and conversions.
In the fall, I will be attending Stanford University. I will be a course assistant for the Computer Science Department and a research assistant for the Department of Statistics. I did my undergrad at the University of Texas at Austin where I got a B.S. in Computer Science and a B.B.A. in Finance.
My technical interests lie in probablistic graphical models, distributed in-memory databases, unsupervised feature learning. I am very interested in applying these techniques to large amounts of data in the Internet setting.
Statistical Learning & Data Mining, Graduate Databases, Computational Biology, Optimiation, Stochastic Processes, Business Law, Financial Accounting, Corporate Strategy
Undergraduate Research Assistant
Research in Bioinformatics & Semantic Web (RiBS) Lab
September 2012 - Present
Undergraduate Teaching Assistant – Lab Proctor
EDS Financial Trading & Technology Center
September 2012 - May 2014
Deep learning is a subfield of machine learning and is concerned with learning at "multiple levels of representation and abstraction." In other words, deep learning is about learning features of features. Specifically, I am interested in how to learn features, unsupervised. Read more on Wikipedia »
Typically used in machine learning and statistics, graphical models are used to represent many variables and the dependence among them. Markov networks and Bayesian networks are examples of PGMs. Read more on Coursera »
Distributed graph databases require different storage schema and have different access patterns compared to relational databases. I explore systems including MapReduce/Hadoop, Hive, Spark, and Amazon EC2, SQL, and SPARQL. Read about MapReduce on Wikipedia »
 P. Cudre-Mauroux, I. Enchev, S. Fundatureanu, P. Groth, A. Haque, A. Harth, F. Keppmann, D. Miranker, J. Sequeda, and M. Wylot. "NoSQL Databases for RDF: An Empirical Evaluation." Proceedings of the 12th International Semantic Web Conference (ISWC). LNCS, vol. 8219, pp. 310-325. Springer, 2013. DOI: 10.1007/978-3-642-41338-4_20 Paper » Website »
 A. Haque. "A MapReduce Approach to NoSQL RDF Databases." The University of Texas at Austin, Department of Computer Science. Report# HR-13-13 (honors theses). Dec 2013. 81 pages. Thesis (4 MB) » Presentation (21 MB) »
Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data. May 2014. Poster (3 MB) » Paper »
An Empirical Evaluation of Approximation Algorithms for the Metric Traveling Salesman Problem (2013) Paper »
Selection Coefficients versus Omega for Codon Substitution Rates (2013) Presentation »
HaLoop: Efficient Iterative Data Processing on Large Clusters (2013) Presentation (3 MB) »
Distributed RDF Triple Store Using SPARQL, HBase, and Hive (2012) Paper »
An Analysis and Comparison of Processor Scheduling Techniques (2012) Paper »
A Novel Approach to Cellular Tracking and Surveillance (2010) Paper »
This website's source code is available on Github.
Last Modified: 2014-07-14 23:38 +0000.