Qualcomm AI250 Stacks DRAM Over Compute But Leaves FLOPS Undisclosed
Qualcomm is pitching high-bandwidth compute for AI inference, with AI250 cards claiming 768 GB of memory and 133 TB/s of effective bandwidth, but the company has not disclosed peak FLOPS or named customers.

Qualcomm Moves AI250 Memory Closer To Compute
Qualcomm is using its AI250 accelerator roadmap to push a different answer to the AI inference memory bottleneck.
The company describes high-bandwidth compute, or HBC, as a 3D-stacked design that places DRAM above logic so some work can happen closer to memory.
The AI250 is due to follow the AI200 Dragonfly rack systems and is planned to begin shipping in 2027.
Qualcomm also outlined a second-generation HBC platform, the AI300, for 2028.
Qualcomm says the AI250 card will carry 768 GB of memory and up to 133 TB/s of effective memory bandwidth.
The company ties those claims to bandwidth-bound inference work, especially decode, where model weights are streamed from memory during token generation.
Effective Bandwidth Claims Need More Detail
The company is presenting HBC as a way to reduce data movement between memory and compute.
Qualcomm says the architecture uses LPDDR memory in a purpose-built near-memory design and differs from HBM because HBC does computing in the base logic die.
The bandwidth claims still depend on Qualcomm's definition of effective bandwidth.
For the AI200 generation, Qualcomm had cited 414 TB/s of effective memory bandwidth across 56 chips.
The AI250 marketing material says HBC gives 18x the AI200's effective bandwidth, while the AI300 would reach 54x.
Qualcomm says the AI250 can operate as a standalone AI accelerator.
It also says the part can sit in disaggregated inference systems, with GPUs or other Qualcomm parts handling prompt processing and AI250 accelerators handling memory-intensive decode.
The company declined to give peak FLOPS for AI250.
It also did not give the detailed physical bandwidth calculation behind the headline effective-bandwidth figures, even as the disclosed figures indicate that ordinary LPDDR5x bandwidth would not explain the claimed totals by itself.
Modular Deal Targets The Software Gap
Qualcomm's investor-day push also included its planned acquisition of Modular, the AI software startup behind Mojo and the Max serving platform.
Mojo is positioned as a low-level programming interface that can run across different hardware, while Max targets LLM model serving.
AI accelerator buyers are comparing more than silicon specifications.
They need serving tools, developer support and deployment paths that do not lock every workload to one vendor stack.
Qualcomm is using Modular to address that software gap while Nvidia and AMD remain the main comparison points for AI infrastructure buyers.
The plan also assumes Qualcomm can make a heterogeneous inference model attractive.
The article describes a possible split where other chips handle prompt processing and AI250 systems focus on memory-intensive decode, but it does not identify production deployments using that design.
Qualcomm has not disclosed peak FLOPS for AI250, the detailed method behind its effective bandwidth calculation, named AI250 customers, production deployment dates beyond the 2027 target or whether regulators will clear the Modular acquisition this year.
















