HX-SDP · architecture

The architecture behind the structural data platform.

HX-SDP ingests dense vectors and related data, represents it as economy-SVD latent factors plus an SQ8 rerank sidecar, and serves cache, vector, feature, search, retention, and observability workflows from the same GPU-native runtime.

services addressed

8 eliminated · 3 collapsed · 1 simplified

hx-gate + hx-engine

fleet topology

hx-engine engine · hx-gate gate

5 verbs

SDK surface

put · get · query · search · serve

What HX-SDP is

One runtime where every access pattern reads the same representation.

Traditional stacks duplicate the same data into caches, vector databases, search indexes, feature stores, streams, and observability systems. HX-SDP collapses those copies into one representation and one serving surface.

Input

Dense vectors, features, streams

→

Atlas

classify structure + policy

→

Latent

Z(N,r) + V_T(r,D)

→

SQ8

int8 sidecar rerank

→

Serve

cache · vectors · features · search

Service replacement map

Twelve workflows become one platform boundary.

HX-SDP distinguishes what is eliminated, what is architecturally collapsed, and what is simplified. The result is a concrete operational claim without overstating what remains.

Workflow	Traditional vendors	Outcome	How HX-SDP handles it
KV cache	Redis, Memcached	Eliminated	Representations are the values; L0/L1/L2 cache hierarchy serves hot reads.
Feature store	Feast, Tecton	Eliminated	Online/offline split disappears; features are versioned representations.
Search index	Elasticsearch, OpenSearch	Eliminated	BM25, trie, fuzzy, metadata filters, and hybrid search read the same store.
Vector DB	Pinecone, Weaviate, Milvus	Eliminated	Tier-1 latent scan plus Tier-2 SQ8 rerank replaces ANN index fleets.
ETL pipeline	Airflow, dbt, Spark	Eliminated	Representation is the transformation; no multi-sink DAG.
Event stream	Kafka, Kinesis	Eliminated	At-least-once ingest, DLQ behavior, snapshot lineage, and direct representation. WAL remains a roadmap recovery mechanism.
Stream retention	Confluent, MSK	Eliminated	Compressed representations make retention economics tractable.
API gateway	Kong, Apigee	Eliminated	hx-gate handles auth, ACL, rate limiting, billing, audit, and proxy.
GPU cluster	DGX, P5/G6 fleets	Collapsed	Single GPU serving for validated scale envelopes; sharding remains extension path.
KV offload	PagedAttention, NVMe spillover	Collapsed	Latent factors and SQ8 sidecar shrink memory footprint before paging is needed.
Observability	Datadog, Splunk	Collapsed	One service surface with Prometheus metrics, telemetry aggregation, and JSONL audit.
Training pipeline	SageMaker data staging	Simplified	Training compute remains; feature materialization and ETL-to-training shrink.

Production architecture

SVD-latent + SQ8 is the production hot path. QTT is the broader core.

HX-SDP benchmark claims are tied to the SVD-latent + SQ8 path. QTT remains part of the HolonomiX technology core and an alternate ingest path for callers that already hold TT cores.

Hot path

Dense X → Z + V_T → SQ8 sidecar.

Queries project q into rank-r space, scan Z · w, then rerank candidates from SQ8 in original D-space. No dense materialization in the compute path.

query path

w = V_T @ q
scores = Z @ w
candidates = topk(scores, rerank_k=100)
final = sq8_rescore(candidates, q)

Operational surface

hx-gate, hx-engine, Redis.

hx-gate handles tenant auth, namespace ACL, rate limiting, CU billing, audit, and WebSocket proxy. hx-engine runs the GPU serving runtime. Redis holds shared gate state.

fleet

load balancer
  → hx-gate :8080
      → Redis :6379
      → hx-engine :8000
          → GPU + /var/lib/holonomix

Fit boundaries

Clear qualification is part of the product.

HX-SDP is strong when the workload can exploit structure and the buyer can operate a bounded GPU-native deployment. It is not a generic managed-cloud vector database replacement for every team today.

Strong fit

You are running Redis / Pinecone / Feast / Elasticsearch near GPU workloads.

Your data has repeated structure, spectra, feature families, or shared embeddings.

You can batch, snapshot, or schedule rebuilds instead of requiring ultra-high single-insert streaming.

You want proof-pack diligence: manifests, receipts, benchmark methodology, and explicit limitations.

Not the right surface yet

You need fully managed public cloud self-service today.

You require high-concurrency distributed serving beyond the current single-GPU ceiling without custom sharding.

You cannot operate a GPU node, VM image, or private deployment surface.

You need TQ4 compressed-path certification for a specific model/corpus before the certification phase is complete.

Precision tiers

Pick the tier from the evidence boundary.

The benchmark page carries the detailed tables. The product page gives the operating interpretation and directs exact-recall buyers to fp32/fp64, while describing fp16 as a scale envelope with explicit recall bounds.

Exact

fp64

Exact-recall path. Use when approximate answers are unacceptable.

Max / GPU: 200M entriesRecall: 100%Peak p50: 40.40 msBytes / entry: 277

Production

fp32

Default production tier. Full recall, balanced throughput.

Max / GPU: 500M entriesRecall: 100%Peak p50: 76.53 msBytes / entry: 132

Scale

fp16

Scale envelope. Recall floor disclosed; use with bounded calibration.

Max / GPU: 2B entriesRecall: 100%Peak p50: 38.51 msBytes / entry: 66

Next step

Map your stack to a precision tier.

Send the workload, current services, scale, recall tolerance, rebuild cadence, and deployment constraints. The intake maps those inputs to a bounded evaluation path.

Request evaluation Review benchmarks