Joy Hsu

Hi — I'm Joy, a Ph.D. candidate in computer science and Knight Hennessy scholar at Stanford University, studying artificial intelligence and computer vision. I am advised by Prof. Jiajun Wu in the CogAI group & Stanford Vision and Learning Lab. My research is graciously funded by Knight Hennessy and the NSF Graduate Research Fellowship.

I finished my B.S. with honors and M.S. with distinction in research at Stanford in 2021, where I was fortunate to be awarded the Ben Wegbreit Prize for best thesis in computer science and the university's Firestone Medal for excellence in research. I was advised by the wonderful Prof. Serena Yeung and Prof. Wah Chiu, and conducted research jointly at Stanford AI Lab and SLAC National Accelerator Laboratory.

Interests

My research interests are in visual reasoning and neuro-symbolic learning in the computer vision domain. My goal is to build models that can perceive, interpret, reason over, and interact with the physical world around us, across sensing modalities and language instructions. I’m particularly interested in building visual reasoning models that are generalists, which leverage abstractions to interpret the world as humans do, and use hybrid representations to solve complex, multi-step tasks across diverse, data-scarce settings.

You can reach me at joycj[at]stanford.edu!

Research

2026

Learning Situated Awareness in the Real World

International Conference on Machine Learning (ICML) 2026 [Spotlight]

Chuhan Li, Ruilin Han, Joy Hsu, Yongyuan Liang, Rajiv Dhawan, Jiajun Wu, Ming-Hsuan Yang, and Xin Eric Wang
[paper] [project page]

A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Conference on Medical Imaging with Deep Learning (MIDL) 2026

Christina Liu*, Alan Wang*, Joy Hsu, Jiajun Wu, and Ehsan Adeli
[paper] [project page]

Neuro-Symbolic Decoding of Neural Activity

International Conference on Learning Representations (ICLR) 2026

Yanchen Wang*, Joy Hsu*, Ehsan Adeli†, and Jiajun Wu†
[paper] [project page]

Discovering Hybrid World Representations with Co-Evolving Foundation Models

AAAI Conference on Artificial Intelligence (AAAI) 2026

Jiajun Wu, Yunzhi Zhang, Hong-Xing Yu, Joy Hsu, and Jiayuan Mao
[paper]

2025

From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries

Conference on Neural Information Processing Systems (NeurIPS) 2025

Joy Hsu, Emily Jin, Jiajun Wu, and Niloy J. Mitra
[paper] [project page]

What Makes a Maze Look Like a Maze?

International Conference on Learning Representations (ICLR) 2025

Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, and Jiajun Wu
[paper] [project page]

Predicate Hierarchies Improve Few-Shot State Classification

International Conference on Learning Representations (ICLR) 2025

Emily Jin*, Joy Hsu*, and Jiajun Wu
[paper] [project page]

Visually Descriptive Language Model for Vector Graphics Reasoning

Transactions on Machine Learning Research (TMLR)
CVPR Workshop On Multimodal Algorithmic Reasoning 2025 [Spotlight Paper]

Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, and Heng Ji
[paper] [project page]

2024

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Chun Feng*, Joy Hsu*, Weiyu Liu, and Jiajun Wu
[paper] [project page]

Spatially Compositional Diffusion

CVPR Generative Models Workshop 2024

Ryan Lian*, Xingjian Bai*, Joy Hsu, Weiyu Liu, Jiayuan Mao, and Jiajun Wu
[paper]

Learning Planning Abstractions from Language

International Conference on Learning Representations (ICLR) 2024

Weiyu Liu*, Geng Chen*, Joy Hsu, Jiayuan Mao†, and Jiajun Wu†
[paper] [project page]

2023

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

Conference on Neural Information Processing Systems (NeurIPS) 2023

Joy Hsu*, Jiayuan Mao*, Joshua B. Tenenbaum, and Jiajun Wu
[paper] [project page]

Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?

NeurIPS ICBINB Workshop (PMLR) 2023 [Best Poster Award]

Joy Hsu, Gabriel Poesia, Jiajun Wu, and Noah D. Goodman
[paper]

Composable Part-Based Manipulation

Conference on Robot Learning (CoRL) 2023

Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, and Jiajun Wu
[paper] [project page]

Motion Question Answering via Modular Motion Programs

International Conference on Machine Learning (ICML) 2023

Mark Endo*, Joy Hsu*, Jiaman Li, and Jiajun Wu
[paper] [project page]

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

Conference on Computer Vision and Pattern Recognition (CVPR) 2023
CVPR Workshop On Compositional 3D Vision 2023 [Oral Presentation]

Joy Hsu, Jiayuan Mao, and Jiajun Wu
[paper] [project page]

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

International Conference on Learning Representations (ICLR) 2023 [Notable Top 25%]

Renhao Wang*, Jiayuan Mao*, Joy Hsu, Hang Zhao, Jiajun Wu, and Yang Gao
[paper] [project page]

2022

DisCo: Improving Compositional Generalization in Visual Reasoning through Distribution Coverage

Transactions on Machine Learning Research (TMLR)

Joy Hsu, Jiayuan Mao, and Jiajun Wu
[paper] [project page]

Geoclidean: Few-Shot Generalization in Euclidean Geometry

Conference on Neural Information Processing Systems Datasets and Benchmarks (NeurIPS) 2022

Joy Hsu, Jiajun Wu, and Noah D. Goodman
[paper] [project page]

2021

Unsupervised Learning for Discovery in 2D & 3D Scenes: Towards Unbiased Understanding of Biomedical Images

Stanford CS Honors Thesis [Ben Wegbreit Prize for Best Thesis]

Joy Hsu
[paper]

Capturing Implicit Hierarchical Structure in 3D Biomedical Images with Self-Supervised Hyperbolic Representations

Conference on Neural Information Processing Systems (NeurIPS) 2021

Joy Hsu*, Jeff Gu*, Gong-Her Wu, Wah Chiu, and Serena Yeung
[paper] [project page]

DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

Conference on Computer Vision and Pattern Recognition (CVPR) 2021

Joy Hsu, Wah Chiu, and Serena Yeung
[paper] [project page]

2020

Learning Hyperbolic Representations for Unsupervised 3D Segmentation

NeurIPS Differential Geometry Workshop 2020 [Contributed Talk]

Joy Hsu*, Jeff Gu*, and Serena Yeung
[paper]

Improving Medical Annotation Quality to Decrease Labeling Burden Using Stratified Noisy Cross-Validation

Investigative Ophthalmology & Visual Science, 61(7), 4537-4537.
ACM CHIL Workshop 2020 [Spotlight Talk]

Joy Hsu*, Sonia Phene*, Akinori Mitani, Jieying Luo, Naama Hammel, Jonathan Krause, Rory Sayres
[paper]

Interests

Research

2026

Learning Situated Awareness in the Real World

International Conference on Machine Learning (ICML) 2026 [Spotlight]

A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Conference on Medical Imaging with Deep Learning (MIDL) 2026

Neuro-Symbolic Decoding of Neural Activity

International Conference on Learning Representations (ICLR) 2026

Discovering Hybrid World Representations with Co-Evolving Foundation Models

AAAI Conference on Artificial Intelligence (AAAI) 2026

2025

From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries

Conference on Neural Information Processing Systems (NeurIPS) 2025

What Makes a Maze Look Like a Maze?

International Conference on Learning Representations (ICLR) 2025

Predicate Hierarchies Improve Few-Shot State Classification

International Conference on Learning Representations (ICLR) 2025

Visually Descriptive Language Model for Vector Graphics Reasoning

Transactions on Machine Learning Research (TMLR) CVPR Workshop On Multimodal Algorithmic Reasoning 2025 [Spotlight Paper]

2024

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Spatially Compositional Diffusion

CVPR Generative Models Workshop 2024

Learning Planning Abstractions from Language

International Conference on Learning Representations (ICLR) 2024

2023

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

Conference on Neural Information Processing Systems (NeurIPS) 2023

Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?

NeurIPS ICBINB Workshop (PMLR) 2023 [Best Poster Award]

Composable Part-Based Manipulation

Conference on Robot Learning (CoRL) 2023

Motion Question Answering via Modular Motion Programs

International Conference on Machine Learning (ICML) 2023

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

Conference on Computer Vision and Pattern Recognition (CVPR) 2023 CVPR Workshop On Compositional 3D Vision 2023 [Oral Presentation]

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

International Conference on Learning Representations (ICLR) 2023 [Notable Top 25%]

2022

DisCo: Improving Compositional Generalization in Visual Reasoning through Distribution Coverage

Transactions on Machine Learning Research (TMLR)

Geoclidean: Few-Shot Generalization in Euclidean Geometry

Conference on Neural Information Processing Systems Datasets and Benchmarks (NeurIPS) 2022

2021

Unsupervised Learning for Discovery in 2D & 3D Scenes: Towards Unbiased Understanding of Biomedical Images

Stanford CS Honors Thesis [Ben Wegbreit Prize for Best Thesis]

Capturing Implicit Hierarchical Structure in 3D Biomedical Images with Self-Supervised Hyperbolic Representations

Conference on Neural Information Processing Systems (NeurIPS) 2021

DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

Conference on Computer Vision and Pattern Recognition (CVPR) 2021

2020

Learning Hyperbolic Representations for Unsupervised 3D Segmentation

NeurIPS Differential Geometry Workshop 2020 [Contributed Talk]

Improving Medical Annotation Quality to Decrease Labeling Burden Using Stratified Noisy Cross-Validation

Investigative Ophthalmology & Visual Science, 61(7), 4537-4537. ACM CHIL Workshop 2020 [Spotlight Talk]

Teaching

CS 271: Artificial Intelligence in Healthcare [2019, 2020]

CS 41: The Python Programming Language [2018, 2019]

Transactions on Machine Learning Research (TMLR)
CVPR Workshop On Multimodal Algorithmic Reasoning 2025 [Spotlight Paper]

Conference on Computer Vision and Pattern Recognition (CVPR) 2023
CVPR Workshop On Compositional 3D Vision 2023 [Oral Presentation]

Investigative Ophthalmology & Visual Science, 61(7), 4537-4537.
ACM CHIL Workshop 2020 [Spotlight Talk]