Academics and Cousework

Selected List of Courses Taken

Graduate Coursework @ Stanford

Course Number	Course Title	Instructor	Quarter Taken
EE 309A	Emerging Non-Volatile Memory Devices and Circuit Design	H.-S. Philip Wong	Winter 2026
EE 382A	Parallel Processors Beyond Multicore Processing	Andrea Di Blas	Spring 2025
CS 149	Parallel Computing	Kayvon Fatahalian	Fall 2024
EE 271	Introduction to VLSI Systems	Thierry Tambe	Fall 2024

Undergraduate Coursework @ Michigan

Course Number	Course Title	Instructor	Semester Taken
EECS 573	Microarchitecture	Todd Austin	Fall 2023
EECS 583	Advanced Compiler Construction	Scott Mahlke	Fall 2023
CEE 552	Travel Analysis and Forecasting	Atiyya Shaw	Fall 2023
EECS 470	Computer Architecture	Jonathan Beaumont	Winter 2023
EECS 442	Computer Vision	Andrew Owens	Fall 2022
CEE 551	Traffic Science	Henry Liu	Fall 2022
EECS 388	Introduction to Computer Security	Peter Honeyman	Winter 2022
CEE 375	Sensors and Circuits	Jeff Scruggs	Winter 2022

Undergraduate Course Projects and Talks

“Patchouli”: Performance Analysis and Tiling Choice Optimization Using LLVM IR

Course project, EECS 583 Advanced Compilers, University of Michigan, 2023

This is the term project for the EECS 583 Advanced Compilers course at the University of Michigan, taught by Prof. Scott Mahlke. Together with collaborators Yongyi Yang and Boren Ke, we implemented a LLVM optimization pass to correctly transform canonical, iterative matrix multiplication functions into their tiled versions, along with a regression-based algorithm that searches for optimal tile sizes with LLVM runtime profile information.

Patchouli

Image credit: うさちゃこ, Touhou Project, patchouli knowledge, yukkuri / 【フリー素材】ちゃこ式ゆっくりパチュリー【立ち絵】. 2023. Accessed: Dec. 06, 2023. [Online]. Available: https://www.pixiv.net/en/artworks/111069201

Project Report Abstract

This project implements tiled matrix multiplication and automatic, profile-guided tile size selection as LLVM optimization passes. Given the rising cost of matrix multiplication operations in ever more complex software applications, the method of splitting matrices involved in multiplication operations into tiles to exploit spatial locality in caches and improve memory performance has garnered renewed attention. However, despite theoretical advancements in this field, there has not been a truly language-independent, syntactically transparent, SSA-based optimization pass implemented for both performing matrix tiling and selecting ideal tile sizes given individual memory system configurations. This has prevented more widespread integration of advanced matrix tiling techniques into existing compiler toolchains. Our project seeks to address this research gap, with an LLVM optimization pass to correctly transform canonical, iterative matrix multiplication functions into their tiled versions, along with a regression-based algorithm that searches for optimal tile sizes with LLVM runtime profile information. As the result of our experiment, we observed a nearly 2x performance speed-up when we run our algorithm on the EECS 583 class server, for matrix multiplication operands with dimensions ranging from 1500 to 1700.

Please reach out to me if you are interested in reading the full report.

A tour of accelerator architectures for AI/ML applications

Topic lecture, EECS 573 Microarchitecture, University of Michigan, 2023

In collaboration with Mason Nelson, I gave a lecture on the necessity, architectural foundations, academic research, and industry trends of accelerators for AI/ML applications. This lecture was given as part of the EECS 573 Microarchitecture course at the University of Michigan, taught by Prof. Todd Austin.

Learning objectives for this lecture include:

Characterize common operations in Machine Learning that can be accelerated
Understand why Machine Learning is an inefficient task for general-purpose processors and why accelerators are necessary
Become acquainted with various architecture paradigms, both historical and contemporary, to AI accelerator design
Review current academic research and commercial products that implement the basic architecture paradigms
Reflect on possible next steps in accelerator design

Please reach out to me if you are interested in accessing the recording or the slides for this talk.

An out-of-order, super-scalar implementation of the RISC-V ISA in the style of the P6 micro-architecture

Course project, EECS 470 Computer Architecture, University of Michigan, 2023

This is a team project for the EECS 470 Computer Architecture class at the University of Michigan, taught by Dr. Jonathan Beaumont. Over the span of the Winter 2023 semester, my team completed an implementation of the P6 microarchitecture using the RISC- V instruction set. My collaborators included Wenjie Geng, Haowen Tan, Yunjie Zhang, and Yunqi Zhang.

Our design includes several features that contribute to high performance, including out-of-order execution, superscalar instruction fetch and dispatch, a non-blocking instruction cache, instruction prefetching, and a branch target buffer, among others. We have ensured the correctness of our processor through rigorous testing and validation, even if it comes at a cost of fewer claimed advanced features, and we believe that our implementation provides a powerful and flexible CPU design suitable for a wide range of applications.

A high-level overview of our design is shown below.

Microprocessor Overview

Please reach out to me if you are interested in reading the full report or accessing the source code for this project.

Investigation of representation of accidents in car-following models

Course project, CEE 551 Traffic Science, University of Michigan, 2022

This is the term project for the CEE 551 Traffic Science course at the University of Michigan, taught by Prof. Henry Liu. Together with collaborator Zhaoming Zeng, we investigated the representation of accidents by different car-following models and the performance of different car-following models in the evaluation of traffic safety.

Project Report Abstract

It has become more commonplace for researchers and industries to utilize microscopic traffic simulators to evaluate the efficiency and safety of road designs and assisted or autonomous vehicle control systems. As a result, it is also ever more important for the underlying car- following models of such simulators to accurately represent the behavior of both autonomous vehicles and human-driven ones, especially under abnormal, near-accident situations. However, even given the increasing complexity of rules-based, analytical models and the emergence of novel machine learning and data-based car-following algorithms, not all such models may be up to this task of accurately reflecting traffic crash probabilities and replicating near-crash scenarios in the real world.

We will investigate this issue from the following different perspectives. Firstly, we shall demonstrate that even car-following models that do not account for driver miscues or high- risk scenarios (“accident-free”) can also produce accidents in simulations, using arithmetic calculations with the Intelligent Driver Model (IDM). We shall then review various ways in the literature to modify traditional, rules-based, accident-free models to account for high-risk or crash situations, as well as how to calibrate such modifications, using the Gipps model as an example. Lastly, we will turn to more novel, data-based models that utilize machine learning and neural networks to predict vehicle trajectories. We shall evaluate their primary characteristics and drawback in modeling high-risk or accident riving scenarios using the example of a Long Short-Term Memory (LSTM) model, and provide suggestions on improving the capabilities of data-driven models in the safety evaluation of car-following behavior.

Please reach out to me if you are interested in reading the full report.

Implementation and evaluation of a nested, variable-depth UNet++ model architecture for medical imaging segmentation

Course project, EECS 442 Computer Vision, University of Michigan, 2022

This is the term project for the EECS 442 Computer Vision course at the University of Michigan, taught by Prof. Andrew Owens. In collaboration with Feilong Meng and Yongxiang Zhao, we implemented and evaluated the performance of a variable-depth UNet++ architecture in the context of medical imaging segmentation.

Project Report Abstract

UNet++ essentially is an encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways, aiming at reducing the semantic gap between the feature maps of the two sub-networks. We implemented a variation of this architecture where the code base enables the user to modify the depth of the network. We also trained and tested different instances of this network with varying levels of depth on a dataset of retinal imaging scans, comparing their time and space complexities and training and validation accuracies, not only with one another but also with a fixed-depth baseline model. It can be observed that as the depth in layers of the U-Net++ network increases, the time complexity of training the network increases linearly and the space complexity of the network in terms of model parameters increases exponentially; however, only marginal increases in training and validation accuracies can be gained after the model is more than four layers deep.

Please reach out to me if you are interested in reading the full report or accessing the source code for this project.