KAIST Simulator Tests LLM Infrastructure Before AI Server Buildouts
KAIST researchers developed LLMServingSim 2.0 to test LLM serving infrastructure before large deployments. The simulator models GPUs, NPUs, PIM devices, memory behavior, routing, power use and serving policies. The team plans to open-source the tool and validate it with real LLM serving frameworks.
KAIST tests LLM infrastructure before deployment
KAIST researchers have developed LLMServingSim 2.0, a simulator for testing large language model serving infrastructure before operators build expensive server clusters.
AI Times Korea reported that the work by Professor Jongse Park's computer science team won a best paper award at ISPASS 2026.
The simulator works as a virtual testbed for AI infrastructure design.
Instead of deploying physical systems to compare accelerators, memory devices and serving policies, engineers can model how an LLM service may behave under different cluster configurations.
Why it matters
Large language model services can require very large server fleets.
The KAIST team said modern LLM serving is becoming more complex as operators combine GPUs with other accelerators, memory layers and software methods such as prefill-decode separation and prefix caching.
LLMServingSim 2.0 is designed to estimate throughput, latency, memory usage and power behavior.
It supports heterogeneous environments that can include GPUs, NPUs and processing-in-memory devices, giving cloud providers and semiconductor companies a way to test future AI hardware before it is widely available.
The system accepts workload, cluster configuration and hardware profile inputs.
It then builds a serving engine with request routing and model serving groups, while modeling compute execution, memory access, communication cost, power consumption and runtime outputs.
For mixture-of-experts models, the simulator can reflect expert routing, expert placement, loading and synchronization.
It can also analyze the impact of expert parallelism and expert offloading on serving performance.
Next steps
The researchers plan to release the simulator as open source, connect it with real LLM serving frameworks and keep adding hardware profiles.
Professor Park said AI service competitiveness depends not only on the model, but also on reliable and efficient infrastructure.
For Korea's AI sector, the project highlights the growing importance of infrastructure research behind generative AI.
If broadly validated, it could help cloud operators, AI chip developers and enterprise AI teams lower the cost and risk of testing new LLM serving designs.





