Benchmarks

Measured scale, real embeddings, methodology scope, and limitations.

Benchmark claims are separated by evidence class: public receipt preview, gated proof pack, scale-envelope capacity run, real-embedding calibration, Atlas classification, and roadmap certification.

100M

public receipt

100% · 20.4 ms p50

500M

H100 fp32

100% R@10 scale envelope

H100 fp16

38.51 ms · 96–100% envelope

B200 fp16

60.89 ms · 98% envelope

What this proves

Benchmark rows are not interchangeable.

The strongest credibility comes from keeping each number attached to its evidence boundary.

Evidence	Setup	Artifact status	Claim boundary
100M H100 receipt	H100 80GB, D=384, rank=32	Receipt preview public; full bundle gated	Exact-tier proof path, not universal corpus claim
H100/B200 scale envelope	Generated low-rank data, latent tier	Public table; methodology packet gated	Capacity measurement, not blanket real-embedding claim
Real embeddings	13 production models, public datasets	Public summary; row artifacts gated	Two-tier fp32/SQ8 behavior by model/corpus
Atlas	71 classified rows (text, vision, physics, scientific)	Public summaries; per-row artifacts gated	fp32 calibration; 5 physics rows compressed-certified
Soak / durability	24h soak and snapshot tests	Summary public; detailed logs gated	Operational signal, not production SLA

Download proof preview Proof registry

Scale envelope

Single-GPU ceilings by precision tier.

These rows are measured scale-envelope runs on generated low-rank data. They are valid capacity measurements, not a claim that every real embedding corpus will reproduce the same recall profile.

GPU / tier	max entries	query p50	R@10	query VRAM	compression
H100 fp64	200M	40.40 ms	100%	50.4 GB	11.6× vs fp64
H100 fp32	500M	76.53 ms	100%	73.0 GB	23.3× vs fp64
H100 fp16	1B	38.51 ms	96-100%	72.9 GB	46.5× vs fp64
B200 fp16	2B	60.89 ms	98%	142.0 GB	46.5× vs fp64

Real embeddings

Two-tier recall on measured model/corpus pairs.

The hardened rows show the public behavior of the production query architecture: Tier-1 latent scan and Tier-2 SQ8 rerank. The Atlas tracks rows that recover, flatten, or become rerank-harmful.

Model	rank	ρ	RR R@10	RR p50	classification
Gemini-001 3072d	667	0.22	0.998	1.42 ms	A_ELITE / hardened
Cohere v3 1024d	418	0.41	0.994	2.10 ms	A_ELITE / hardened
OpenCLIP 1024d	401	0.39	0.995	1.31 ms	A_ELITE / hardened
E5-Mistral 4096d	1,146	0.28	0.934	1.63 ms	D_SENSITIVE / hardened

Claim boundaries

Every number carries its scope.

HX-SDP benchmark claims are separated by evidence class: production hot path, signed receipts, scale envelope, real-embedding calibration, and roadmap certification. Exact-recall statements stay tied to the tiers and artifacts that support them.

SVD + SQ8

production hot path

not described as generic QTT-native serving

ML-DSA-65

signed artifacts

FIPS 204 Category 3 receipt chain

Atlas coverage

fp32 calibration, not TQ4 certification

known

limitations

concurrency, WAL, fp16, cold start, host RAM

Limitations

Tradeoffs are part of the benchmark.

Limitation	Operating interpretation
fp16 is not exact at every scale	Use fp32/fp64 for exact-recall requirements; fp16 is the scale tier.
Concurrency has a single-GPU ceiling	At S-100M, production planning should account for roughly 10–20 error-free clients per GPU before sharding.
WAL is not the current recovery mechanism	Crash recovery is snapshot reload; WAL is roadmap in the canonical limitation list.
Atlas rows are calibration	A_ELITE in the current Atlas means native fp32 rerank utility, not compressed deployment certification.
Scale-envelope data is generated low-rank	The 1B/2B rows are measured capacity rows, not direct real-corpus generalization.
Host RAM matters at build	Large-scale builds may require substantial CPU RAM before serving footprint is compact.

Diligence

Review the proof chain next.

Open proof Request proof pack