SEMQ Tests Aim To Cut AI Memory Overhead Without Named Customers

BySendTech Times AI & Enterprise DeskNewsroom-edited, source-reviewed coverage|Source: The Register

Newsroom brief

SEMQ Group is pitching symbolic embedding multi-quantization as a way to preserve retrieval and classification behavior while lowering AI semantic-state overhead, but its public evidence remains benchmark-led and customer names are undisclosed.

Verified against source materialEdited by SendTech Times AI & Enterprise Desk

SEMQ Tests Aim To Cut AI Memory Overhead Without Named Customers

Image source: The Register image

SEMQ Targets Embedding Memory Rather Than Model Weights

SEMQ Group is proposing a different way to reduce the storage and memory burden around AI systems.

The company’s approach, called Symbolic Embedding Multi-Quantization, separates the meaning captured in embeddings from the numerical representation normally used to store those embeddings.

Andrés Mac Allister, SEMQ Group’s CEO and founder, said the method focuses on the structural relationship among embedding components rather than preserving every floating-point magnitude.

The claim is vendor-led, but it is tied to a specific technical target: semantic state used in retrieval, memory and classification workflows.

Mac Allister described conventional embedding systems as sequences of high-precision numerical coordinates.

SEMQ files are described as preserving relative similarity ordering, neighborhood structure and other relational properties while separating the representation from metrics, indexing and execution semantics.

Mac Allister Cites Banking77 Benchmark Results

Mac Allister’s technical explanation gives the storage baseline for conventional quantization.

Mac Allister said FP32 requires 4 bytes per parameter, so a 7B parameter model would need about 28 GB of disk space and memory.

He said FP16 or BF16 requires 2 bytes per parameter and would put the same model near 14 GB.

The same comparison lists smaller options including FP8, INT8, Q8, Q6, Q5, Q4, Q3 and Q2.

Those formats reduce the storage and memory footprint by lowering precision, while SEMQ is pitched as a way to preserve more of the relational structure in embedding state.

Mac Allister pointed to a company validation test built around the Banking77 dataset from MTEB and the all-MiniLM-L6-v2 embedding model.

He said the FP32 baseline achieved 92.26 percent accuracy, SEMQ reached 92.27 percent, and 4-bit quantization produced 56.05 percent accuracy.

Those figures come from the company’s cited validation work, not from named customer production systems.

Customer Evidence Remains Behind NDAs

Mac Allister said SEMQ can be applied when data is ingested or at query time.

In his description, teams could use an SDK on vectors generated by their existing embedding model and could run SEMQ beside an existing LLM, embedding model, vector database or agent framework before using it in selected retrieval or memory workloads.

He also said .semq files have been used in research to snapshot and restore transformer KV-cache state across process boundaries.

He did not present that as a pre-training workflow, but as a runtime-state workflow for pausing, transferring and resuming an active model session.

The early business claim is still limited by disclosure.

Mac Allister said the company signed NDAs with organizations in a Founding Design Partnership Program, including some AI infrastructure hyperscalers and companies at the AI application layer.

SEMQ Group has not named customers, and the public record does not disclose deployment sizes, infrastructure savings or third-party benchmark validation.

#semantic embeddings #AI memory #SEMQ #The SEMQ Group

Chips & Semiconductors

SK hynix Ships HBM4E Samples. AI Memory Buyers Still Need Volume Proof.

SK hynix has sent 12-layer HBM4E samples to major customers, citing 16Gbps per pin speed and a 48GB stack. The announcement shifts the AI memory race from specification claims toward customer qualification and production timing.

Linux Foundation Executives Put MCP Between AI Models And Enterprise Tools

Linux Foundation executives described MCP as a coordination layer that connects AI models to tools, memory and private data, while leaving approved registry lists and production outcomes outside the public record.

Chips & Semiconductors

AMD EPYC 8005 Raises SP6 Core Counts Without Customer Rollout Data

ServeTheHome reported that AMD EPYC 8005 “Sorano” keeps the SP6 server socket while reaching 84 cores, DDR5-6400 memory and CXL 2.0. The sponsored test material disclosed AMD-supplied CPUs, leaving customer deployments and order evidence outside the public record.

MRAgent Cuts Long-Memory Agent Queries To 118k Tokens In Benchmark Tests

National University of Singapore researchers built MRAgent to reconstruct memory through a Cue-Tag-Content graph, with VentureBeat citing LongMemEval prompt use of 118k tokens per sample versus 632k for A-Mem and 3.26 million for LangMem.