HOLONOMIX

Benchmarks

Measured scale, real embeddings, methodology scope, and limitations.

Benchmark claims are separated by evidence class: public receipt preview, gated proof pack, scale-envelope capacity run, real-embedding calibration, Atlas classification, and roadmap certification.

100M
public receipt

100% · 20.4 ms p50

500M
H100 fp32

100% R@10 scale envelope

1B
H100 fp16

38.51 ms · 96–100% envelope

2B
B200 fp16

60.89 ms · 98% envelope

What this proves

Benchmark rows are not interchangeable.

The strongest credibility comes from keeping each number attached to its evidence boundary.

EvidenceSetupArtifact statusClaim boundary
100M H100 receiptH100 80GB, D=384, rank=32Receipt preview public; full bundle gatedExact-tier proof path, not universal corpus claim
H100/B200 scale envelopeGenerated low-rank data, latent tierPublic table; methodology packet gatedCapacity measurement, not blanket real-embedding claim
Real embeddings13 production models, public datasetsPublic summary; row artifacts gatedTwo-tier fp32/SQ8 behavior by model/corpus
Atlas71 classified rows (text, vision, physics, scientific)Public summaries; per-row artifacts gatedfp32 calibration; 5 physics rows compressed-certified
Soak / durability24h soak and snapshot testsSummary public; detailed logs gatedOperational signal, not production SLA

Scale envelope

Single-GPU ceilings by precision tier.

These rows are measured scale-envelope runs on generated low-rank data. They are valid capacity measurements, not a claim that every real embedding corpus will reproduce the same recall profile.

GPU / tiermax entriesquery p50R@10query VRAMcompression
H100 fp64200M40.40 ms100%50.4 GB11.6× vs fp64
H100 fp32500M76.53 ms100%73.0 GB23.3× vs fp64
H100 fp161B38.51 ms96-100%72.9 GB46.5× vs fp64
B200 fp162B60.89 ms98%142.0 GB46.5× vs fp64

Real embeddings

Two-tier recall on measured model/corpus pairs.

The hardened rows show the public behavior of the production query architecture: Tier-1 latent scan and Tier-2 SQ8 rerank. The Atlas tracks rows that recover, flatten, or become rerank-harmful.

ModelrankρRR R@10RR p50classification
Gemini-001 3072d6670.220.9981.42 msA_ELITE / hardened
Cohere v3 1024d4180.410.9942.10 msA_ELITE / hardened
OpenCLIP 1024d4010.390.9951.31 msA_ELITE / hardened
E5-Mistral 4096d1,1460.280.9341.63 msD_SENSITIVE / hardened

Claim boundaries

Every number carries its scope.

HX-SDP benchmark claims are separated by evidence class: production hot path, signed receipts, scale envelope, real-embedding calibration, and roadmap certification. Exact-recall statements stay tied to the tiers and artifacts that support them.

SVD + SQ8
production hot path

not described as generic QTT-native serving

ML-DSA-65
signed artifacts

FIPS 204 Category 3 receipt chain

71
Atlas coverage

fp32 calibration, not TQ4 certification

known
limitations

concurrency, WAL, fp16, cold start, host RAM

Limitations

Tradeoffs are part of the benchmark.

LimitationOperating interpretation
fp16 is not exact at every scaleUse fp32/fp64 for exact-recall requirements; fp16 is the scale tier.
Concurrency has a single-GPU ceilingAt S-100M, production planning should account for roughly 10–20 error-free clients per GPU before sharding.
WAL is not the current recovery mechanismCrash recovery is snapshot reload; WAL is roadmap in the canonical limitation list.
Atlas rows are calibrationA_ELITE in the current Atlas means native fp32 rerank utility, not compressed deployment certification.
Scale-envelope data is generated low-rankThe 1B/2B rows are measured capacity rows, not direct real-corpus generalization.
Host RAM matters at buildLarge-scale builds may require substantial CPU RAM before serving footprint is compact.

Diligence

Review the proof chain next.