Documentation

Dynamic Ledger: Retrieval-Augmented Structured Memory for Test-Time Learning

Our work extends the Dynamic Cheatsheet (DC) framework by Suzgun et al. with two new memory architectures — Strategic Chunk Retrieval (SCR) and Dynamic Ledger (DL) — that replace monolithic memory with structured, chunk-level stores and selective curation. Evaluated on GPT-4o and GPT-5 across 6 benchmarks, Dynamic Ledger achieves up to +10 pp accuracy gains on math reasoning tasks. See the final report for full details.

Team

Jerry Gu*

Stanford University

LinkedIn · GitHub

Shurui Liu*

Stanford University

LinkedIn · GitHub

Sabrina Yen-Ko*

Stanford University

LinkedIn · GitHub

Mirac Suzgun

Stanford University

LinkedIn · GitHub

*Equal contribution. Mentored by Mirac Suzgun.

Installation

git clone https://github.com/srliu3264/dynamic_ledger.git
cd dynamic_ledger
pip install -r requirements.txt
cp config.env.example config.env  # add your API keys

Supported Approaches

Approach	Description
`default`	No cheatsheet; single-pass generation
`DynamicCheatsheet_Cumulative`	Append-only flat text cheatsheet (original DC)
`DynamicCheatsheet_RetrievalSynthesis`	Retrieve past examples, synthesize a query-specific cheatsheet
`Dynamic_Retrieval`	Retrieve top-k chunks, no curation step
`FullHistoryAppending`	Full conversation history appended as context
`DynamicCheatsheet_StrategicChunkRetrieval`	[NEW] Retrieve top-k strategy chunks; curator refines only those chunks
`DynamicCheatsheet_DynamicLedger`	[NEW] Dynamic Ledger — structured JSON store with per-entry CRUD updates

Supported Models

openai/gpt-4o, openai/gpt-4o-mini, openai/gpt-3.5-turbo
openai/gpt-5-2025-08-07
openai/o1, openai/o3-mini
anthropic/claude-3-5-sonnet-latest, anthropic/claude-3-7-sonnet-latest
anthropic/claude-3-5-haiku-latest
xai/grok-3, xai/grok-3-mini, xai/grok-4-fast-non-reasoning
together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
together_ai/deepseek-ai/DeepSeek-R1, together_ai/Qwen/QwQ-32B
gemini/gemini-2.0-flash

Supported Benchmarks

Task	Description	Size
`IneqMath_all`	Competition-style inequality problems (train + dev merged)	1,352
`IneqMath`	IneqMath dev split only	100
`DataSIR`	Full sensitive information recognition dataset	1,647,501
`DataSIR400`	DataSIR 400-problem subset used in our evaluation	400
`AIME_2025`	AIME 2025 problems	varies
`AIME_2024`	AIME 2024 problems	varies
`AIME_2020_2024`	AIME 2020–2024 problems	varies
`GPQA_Diamond`	Graduate-level science QA	varies
`MMLU_Pro_Physics`	MMLU-Pro Physics subset	1,299
`MMLU_Pro_Engineering`	MMLU-Pro Engineering subset	969
`MathEquationBalancer`	Equation balancing task	varies
`GameOf24`	Game of 24	varies

Documentation

Dynamic Ledger: Retrieval-Augmented Structured Memory for Test-Time Learning

Team

Jerry Gu*

Shurui Liu*

Sabrina Yen-Ko*

Mirac Suzgun

Installation

Supported Approaches

Supported Models

Supported Benchmarks

ON THIS PAGE