Dynamic Ledger

Dynamic Ledger — Structured Database with CRUD Operations and Dual-Embedding Retrieval

The Dynamic Ledger (DynamicCheatsheet_DynamicLedger) reframes the cheatsheet as a lightweight database supporting explicit CRUD operations and dual-embedding retrieval over both strategy text and source-problem embeddings.

How It Works

  1. Structured Memory Entries — Each entry is a JSON record carrying a unique identifier, strategy text, an example of the originating problem, a strategy embedding, and a source-problem embedding. Separating and independently embedding strategy and problem enables dual-axis retrieval.

  2. Dual-Embedding Retrieval — Given a new input, retrieval operates along both axes independently: by problem similarity (surfaces strategies born from structurally similar questions) and by strategy similarity (captures techniques whose description matches the current need). The deduplicated union of both sets is returned, yielding at most 2k items.

  3. CRUD-Based Curation — Rather than regenerating retrieved items as a revised text block, the curator emits a structured JSON array of atomic operations:

    • Create: Append a new entry; the system embeds both the strategy text and the current problem.
    • Update: Replace the strategy text of an existing entry by ID; the strategy embedding is re-embedded while the problem embedding is preserved.
    • Delete: Remove an entry by ID, used only when a strategy is demonstrably incorrect or fully subsumed.

Advantages Over DC-Cumulative and SCR

  • Dual-axis retrieval: Recovers relevant entries that strategy-only retrieval misses — decisive on tasks like DataSIR where strategy text alone is a poor retrieval signal.
  • Atomic operations: Each CRUD operation targets exactly one entry, preventing unintended cross-entry interference within the retrieved set.
  • Locality guarantee: Non-retrieved entries are never exposed to the curator and cannot be accidentally altered.
  • Auditable updates: The explicit operation language gives fine-grained, auditable control over each record.

Key Results

  • +6.0 pp on AIME 2020–2024 over DC-RS (30.8% vs 24.8%)
  • +10.0 pp on IneqMath over Default (58.0% vs 48.0%)
  • 100% on MathEquationBalancer (vs 47.2% Baseline)
  • 91.0% on DataSIR with GPT-5 (vs 87.0% Default)

Example Command

python3 run_benchmark.py \
  --task IneqMath_all \
  --approach_name DynamicCheatsheet_DynamicLedger \
  --model_name openai/gpt-4o \
  --generator_prompt_path prompts/generator_prompt_dynamic_ledger.txt \
  --cheatsheet_prompt_path prompts/curator_prompt_dynamic_ledger.txt \
  --retrieve_top_k 3 \
  --max_n_samples 600