Docs — LLM (Tier-2) · X-40 Quantum LLM Core

Status: Baseline path uses SDPA/FlashAttention (v0.6). The X-40 fused Q-K-A-V attention with entropy window (v0.7) is under optimization and will be exposed behind a runtime flag once it beats SDPA consistently at ≥2k context on open 2–7B models.

Integration (preview)

Patch: apply_auto_patch(model) for LLaMA / Qwen / Mistral / Gemma
Runtime flag: X40_FUSION=1 (planned), fallback to SDPA if parity not met
Metrics to report: X40/Dense ratio, energy_Wh per run, Φ-stability

Public API docs will be added when v0.7 clears parity+ benchmarks.