Docs — LLM (Tier-2) · X-40 Quantum LLM Core
Status: Baseline path uses SDPA/FlashAttention (v0.6). The X-40 fused Q-K-A-V attention with entropy window (v0.7) is under optimization and will be exposed behind a runtime flag once it beats SDPA consistently at ≥2k context on open 2–7B models.
Integration (preview)
- Patch:
apply_auto_patch(model)for LLaMA / Qwen / Mistral / Gemma - Runtime flag:
X40_FUSION=1(planned), fallback to SDPA if parity not met - Metrics to report: X40/Dense ratio,
energy_Whper run, Φ-stability
Public API docs will be added when v0.7 clears parity+ benchmarks.