KAIST Simulator Tests LLM Infrastructure Before AI Server Buildouts

Article summary

KAIST researchers developed LLMServingSim 2.0 to test LLM serving infrastructure before large deployments. The simulator models GPUs, NPUs, PIM devices, memory behavior, routing, power use and serving policies. The team plans to open-source the tool and validate it with real LLM serving frameworks.

KAIST Simulator Tests LLM Infrastructure Before AI Server Buildouts

Image source: KAIST / AI Times Korea

KAIST tests LLM infrastructure before deployment

KAIST researchers have developed LLMServingSim 2.0, a simulator for testing large language model serving infrastructure before operators build expensive server clusters.

AI Times Korea reported that the work by Professor Jongse Park's computer science team won a best paper award at ISPASS 2026.

The simulator works as a virtual testbed for AI infrastructure design.

Instead of deploying physical systems to compare accelerators, memory devices and serving policies, engineers can model how an LLM service may behave under different cluster configurations.

Why it matters

Large language model services can require very large server fleets.

The KAIST team said modern LLM serving is becoming more complex as operators combine GPUs with other accelerators, memory layers and software methods such as prefill-decode separation and prefix caching.

LLMServingSim 2.0 is designed to estimate throughput, latency, memory usage and power behavior.

It supports heterogeneous environments that can include GPUs, NPUs and processing-in-memory devices, giving cloud providers and semiconductor companies a way to test future AI hardware before it is widely available.

The system accepts workload, cluster configuration and hardware profile inputs.

It then builds a serving engine with request routing and model serving groups, while modeling compute execution, memory access, communication cost, power consumption and runtime outputs.

For mixture-of-experts models, the simulator can reflect expert routing, expert placement, loading and synchronization.

It can also analyze the impact of expert parallelism and expert offloading on serving performance.

Next steps

The researchers plan to release the simulator as open source, connect it with real LLM serving frameworks and keep adding hardware profiles.

Professor Park said AI service competitiveness depends not only on the model, but also on reliable and efficient infrastructure.

For Korea's AI sector, the project highlights the growing importance of infrastructure research behind generative AI.

If broadly validated, it could help cloud operators, AI chip developers and enterprise AI teams lower the cost and risk of testing new LLM serving designs.

#cloud data centers #KAIST #LLM infrastructure #AI data centers

Cloud & Data Centers

Meta's Ohio AI Data Center Tents Put Speed and Power at the Center of the Capacity Race

Meta has built six rapid deployment structures outside New Albany, Ohio, as it seeks faster AI data center capacity. Local permits reviewed by Michael Thomas show five 125,000-square-foot structures started between April and June, while the site uses 200 megawatts of nearby modular gas turbines. The practical test is whether faster construction helps Meta turn heavy AI capital spending into usable developer and product capacity.

Cloud & Data Centers

AI Data-Centre Spending Turns Energy Costs Into an Inflation Test

AI infrastructure spending is pushing data-centre power demand, construction costs and debt issuance into the inflation debate. Pew Research Centre counted more than 3,000 operational US data centres and about 1,500 more under construction or in early development. The practical test is whether productivity gains arrive before power, construction and financing costs spread further through the economy.

Cloud & Data Centers

Google Compute Lease Turns SpaceX Data Centers Into an AI Capacity Test

SpaceX lined up a Google compute agreement that gives Google access to about 110,000 NVIDIA GPUs and related components. The filing-based terms call for $920 million a month from October 2026 through June 2029, with delivery protections if GPU access is not ready by September 30, 2026. The next signal is whether SpaceX can turn AI data-center capacity into reliable third-party infrastructure before Google's bridge-capacity need changes.

Cloud & Data Centers

Iren Plans 800MW Australia AI Data Center Campus as Power Becomes the Capacity Gate

Iren signed a transmission connection agreement for a planned 800MW data center campus in Bundey, South Australia. The project is Iren's first Australian foray and is expected to be energized in 2028 as the company shifts more cash flow toward AI cloud infrastructure. The practical test is whether Iren can turn grid-connected power, financing and GPU capacity into energized AI cloud campuses on the announced timelines.