China’s AI Labs Turn Self-Improving Models Into A Chip-Efficiency Test
Chinese AI teams are tying recursive self-improvement claims to research automation and kernel optimisation, but the strongest evidence still sits in narrow engineering tasks rather than full autonomous AI research.

China’s AI Labs Move From Chatbots To Research Automation
Chinese AI developers are pushing recursive self-improvement from a speculative safety debate into concrete engineering claims.
The race is centred on systems that can help improve AI research itself, including model work, coding and chip-software optimisation.
The most important China signal is not a single product launch.
It is the clustering of claims from Xiaomi-linked model work, MiniMax, Alibaba's Qwen team, ByteDance and Tsinghua University researchers around AI systems that automate parts of development.
Luo Fuli leads Xiaomi's MiMo AI model work and told the Zhongguancun Forum in March that self-evolution was becoming a near-term AI priority, revising her earlier expectation from three to five years down to one to two years.
That shifts the competitive frame.
US companies such as Anthropic are still treated as coding and research-automation leaders, but Chinese teams are using the same frontier to offset hardware limits, especially where software efficiency can improve the use of scarce AI chips.
Kernel Optimisation Becomes The Hard Test
The clearest technical evidence sits in kernel work, not general claims about autonomous intelligence.
ByteDance and Tsinghua University researchers described a February system that used an AI agent to optimise kernels for Nvidia's CUDA ecosystem.
Their benchmark put the resulting workflow at 100 per cent faster than existing automation methods.
MiniMax made a more product-facing claim around its M3 model.
The company said the model optimised a production-grade FP8 GEMM kernel on Nvidia GPUs in around 24 hours without information that would let it copy an existing answer.
The same task would otherwise have required a human team up to two weeks, according to the company's description.
Alibaba's Qwen team pointed to a similar direction on its own hardware stack.
For Alibaba, the Qwen3.7-Max claim was tied to the company's in-house parallel processing unit platform: the model completed a kernel optimisation run in around 35 hours, or 10 times faster than the alternative process.
Those examples matter because kernel performance affects the cost and speed of large-model inference, where China faces continuing pressure from chip restrictions.
The Measurement Gap Remains Large
The caution is that task automation is not the same as proven recursive self-improvement.
Xu Weixian, an undergraduate AI researcher at Shanghai Jiao Tong University, warned that AI may be accelerating idea discovery without proving that the ideas are meaningful solutions.
Anthropic has made a similar distinction, noting a large gap between autonomous AI research tasks and autonomous goal-setting for research.
The market also lacks stable metrics.
London-based governance researcher Alan Chan said the field remains early and does not yet have concrete ways to measure progress.
Code-generation figures are useful but incomplete: Anthropic put Claude's contribution above 80 per cent of merged code as of May, and a 130-employee poll estimated that Mythos delivered a fourfold productivity gain.
The next watchpoint is whether Chinese and US AI labs publish stronger evidence that automation is improving research outcomes, not just producing faster code.
Until then, recursive self-improvement remains a strategic target with real engineering milestones and unresolved proof standards.
















