HOLONOMIX

HX-SDP · architecture

The architecture behind the structural data platform.

HX-SDP ingests dense vectors and related data, represents it as economy-SVD latent factors plus an SQ8 rerank sidecar, and serves cache, vector, feature, search, retention, and observability workflows from the same GPU-native runtime.

12
services addressed

8 eliminated · 3 collapsed · 1 simplified

hx-gate + hx-engine
fleet topology

hx-engine engine · hx-gate gate

5 verbs
SDK surface

put · get · query · search · serve

What HX-SDP is

One runtime where every access pattern reads the same representation.

Traditional stacks duplicate the same data into caches, vector databases, search indexes, feature stores, streams, and observability systems. HX-SDP collapses those copies into one representation and one serving surface.

01

Input

Dense vectors, features, streams

02

Atlas

classify structure + policy

03

Latent

Z(N,r) + V_T(r,D)

04

SQ8

int8 sidecar rerank

05

Serve

cache · vectors · features · search

Service replacement map

Twelve workflows become one platform boundary.

HX-SDP distinguishes what is eliminated, what is architecturally collapsed, and what is simplified. The result is a concrete operational claim without overstating what remains.

WorkflowTraditional vendorsOutcomeHow HX-SDP handles it
KV cacheRedis, MemcachedEliminatedRepresentations are the values; L0/L1/L2 cache hierarchy serves hot reads.
Feature storeFeast, TectonEliminatedOnline/offline split disappears; features are versioned representations.
Search indexElasticsearch, OpenSearchEliminatedBM25, trie, fuzzy, metadata filters, and hybrid search read the same store.
Vector DBPinecone, Weaviate, MilvusEliminatedTier-1 latent scan plus Tier-2 SQ8 rerank replaces ANN index fleets.
ETL pipelineAirflow, dbt, SparkEliminatedRepresentation is the transformation; no multi-sink DAG.
Event streamKafka, KinesisEliminatedAt-least-once ingest, DLQ behavior, snapshot lineage, and direct representation. WAL remains a roadmap recovery mechanism.
Stream retentionConfluent, MSKEliminatedCompressed representations make retention economics tractable.
API gatewayKong, ApigeeEliminatedhx-gate handles auth, ACL, rate limiting, billing, audit, and proxy.
GPU clusterDGX, P5/G6 fleetsCollapsedSingle GPU serving for validated scale envelopes; sharding remains extension path.
KV offloadPagedAttention, NVMe spilloverCollapsedLatent factors and SQ8 sidecar shrink memory footprint before paging is needed.
ObservabilityDatadog, SplunkCollapsedOne service surface with Prometheus metrics, telemetry aggregation, and JSONL audit.
Training pipelineSageMaker data stagingSimplifiedTraining compute remains; feature materialization and ETL-to-training shrink.

Production architecture

SVD-latent + SQ8 is the production hot path. QTT is the broader core.

HX-SDP benchmark claims are tied to the SVD-latent + SQ8 path. QTT remains part of the HolonomiX technology core and an alternate ingest path for callers that already hold TT cores.

Hot path

Dense X → Z + V_T → SQ8 sidecar.

Queries project q into rank-r space, scan Z · w, then rerank candidates from SQ8 in original D-space. No dense materialization in the compute path.

query path
w = V_T @ q
scores = Z @ w
candidates = topk(scores, rerank_k=100)
final = sq8_rescore(candidates, q)

Operational surface

hx-gate, hx-engine, Redis.

hx-gate handles tenant auth, namespace ACL, rate limiting, CU billing, audit, and WebSocket proxy. hx-engine runs the GPU serving runtime. Redis holds shared gate state.

fleet
load balancer
  → hx-gate :8080
      → Redis :6379
      → hx-engine :8000
          → GPU + /var/lib/holonomix

Fit boundaries

Clear qualification is part of the product.

HX-SDP is strong when the workload can exploit structure and the buyer can operate a bounded GPU-native deployment. It is not a generic managed-cloud vector database replacement for every team today.

Strong fit

You are running Redis / Pinecone / Feast / Elasticsearch near GPU workloads.
Your data has repeated structure, spectra, feature families, or shared embeddings.
You can batch, snapshot, or schedule rebuilds instead of requiring ultra-high single-insert streaming.
You want proof-pack diligence: manifests, receipts, benchmark methodology, and explicit limitations.

Not the right surface yet

You need fully managed public cloud self-service today.
You require high-concurrency distributed serving beyond the current single-GPU ceiling without custom sharding.
You cannot operate a GPU node, VM image, or private deployment surface.
You need TQ4 compressed-path certification for a specific model/corpus before the certification phase is complete.

Precision tiers

Pick the tier from the evidence boundary.

The benchmark page carries the detailed tables. The product page gives the operating interpretation and directs exact-recall buyers to fp32/fp64, while describing fp16 as a scale envelope with explicit recall bounds.

Exact

fp64

Exact-recall path. Use when approximate answers are unacceptable.

Max / GPU: 200M entriesRecall: 100%Peak p50: 40.40 msBytes / entry: 277

Production

fp32

Default production tier. Full recall, balanced throughput.

Max / GPU: 500M entriesRecall: 100%Peak p50: 76.53 msBytes / entry: 132

Scale

fp16

Scale envelope. Recall floor disclosed; use with bounded calibration.

Max / GPU: 2B entriesRecall: 100%Peak p50: 38.51 msBytes / entry: 66

Next step

Map your stack to a precision tier.

Send the workload, current services, scale, recall tolerance, rebuild cadence, and deployment constraints. The intake maps those inputs to a bounded evaluation path.