SEMQ Tests Aim To Cut AI Memory Overhead Without Named Customers
SEMQ Group is pitching symbolic embedding multi-quantization as a way to preserve retrieval and classification behavior while lowering AI semantic-state overhead, but its public evidence remains benchmark-led and customer names are undisclosed.

SEMQ Targets Embedding Memory Rather Than Model Weights
SEMQ Group is proposing a different way to reduce the storage and memory burden around AI systems.
The company’s approach, called Symbolic Embedding Multi-Quantization, separates the meaning captured in embeddings from the numerical representation normally used to store those embeddings.
Andrés Mac Allister, SEMQ Group’s CEO and founder, said the method focuses on the structural relationship among embedding components rather than preserving every floating-point magnitude.
The claim is vendor-led, but it is tied to a specific technical target: semantic state used in retrieval, memory and classification workflows.
Mac Allister described conventional embedding systems as sequences of high-precision numerical coordinates.
SEMQ files are described as preserving relative similarity ordering, neighborhood structure and other relational properties while separating the representation from metrics, indexing and execution semantics.
Mac Allister Cites Banking77 Benchmark Results
Mac Allister’s technical explanation gives the storage baseline for conventional quantization.
Mac Allister said FP32 requires 4 bytes per parameter, so a 7B parameter model would need about 28 GB of disk space and memory.
He said FP16 or BF16 requires 2 bytes per parameter and would put the same model near 14 GB.
The same comparison lists smaller options including FP8, INT8, Q8, Q6, Q5, Q4, Q3 and Q2.
Those formats reduce the storage and memory footprint by lowering precision, while SEMQ is pitched as a way to preserve more of the relational structure in embedding state.
Mac Allister pointed to a company validation test built around the Banking77 dataset from MTEB and the all-MiniLM-L6-v2 embedding model.
He said the FP32 baseline achieved 92.26 percent accuracy, SEMQ reached 92.27 percent, and 4-bit quantization produced 56.05 percent accuracy.
Those figures come from the company’s cited validation work, not from named customer production systems.
Customer Evidence Remains Behind NDAs
Mac Allister said SEMQ can be applied when data is ingested or at query time.
In his description, teams could use an SDK on vectors generated by their existing embedding model and could run SEMQ beside an existing LLM, embedding model, vector database or agent framework before using it in selected retrieval or memory workloads.
He also said .semq files have been used in research to snapshot and restore transformer KV-cache state across process boundaries.
He did not present that as a pre-training workflow, but as a runtime-state workflow for pausing, transferring and resuming an active model session.
The early business claim is still limited by disclosure.
Mac Allister said the company signed NDAs with organizations in a Founding Design Partnership Program, including some AI infrastructure hyperscalers and companies at the AI application layer.
SEMQ Group has not named customers, and the public record does not disclose deployment sizes, infrastructure savings or third-party benchmark validation.
















