Springboards Tests Qwen 3 Model Against Repetitive LLM Answers

BySendTech Times AI & Enterprise DeskNewsroom-edited, source-reviewed coverage|Source: MIT Technology Review

Newsroom brief

Australian startup Springboards has built Flint on Alibaba’s Qwen 3 to produce more varied answers to open-ended prompts. MIT Technology Review’s article pairs the company’s claim with a NeurIPS-winning homogeneity paper and user cautions that the prototype still fails under pressure.

Verified against source materialEdited by SendTech Times AI & Enterprise Desk

Springboards Tests Qwen 3 Model Against Repetitive LLM Answers

Image source: MIT Technology Review

Springboards Builds Flint On Qwen 3

Australian startup Springboards has built an LLM called Flint to make open-ended chatbot answers less repetitive.

The company is pitching the model to advertising and marketing users who want more varied brainstorming output than mainstream systems often produce.

Springboards cofounder and CEO Pip Bingemann said most language models are designed to fight hallucinations, while Flint is built to invite more unusual suggestions.

In one demonstration described by MIT Technology Review, ChatGPT and Claude gave the same simple campaign tagline, while Flint returned a different line.

The company built Flint on Qwen 3, the open-source model from Alibaba.

Springboards cofounder and CTO Kieran Browne said training a foundation model was too expensive for the small team, so the company focused on changing where a model introduces variety in its output.

Research Paper Shows Repeated Answers

The startup is working on a problem that AI researchers have also measured.

A November paper titled "Artificial Hivemind" found that different LLMs often converge on similar answers to open-ended prompts.

The researchers asked 25 LLMs to write a metaphor about time 50 times each.

MIT Technology Review said most of the 1,250 responses were versions of "Time is a river" or "Time is a weaver." The paper won a best paper award at NeurIPS.

OpenAI told MIT Technology Review that training models to give reliable and coherent answers can make them converge on familiar, high-probability responses.

OpenAI also said pushing harder for novelty can make responses less reliable.

Prototype Users Still Need Human Judgement

Springboards is offering Flint as an optional model within its brainstorming tool, which lets creative teams combine text from multiple LLMs.

Zoe Scaman, founder of Bodacious and chief strategy officer at 77X, said Flint pushed her in different directions during tests.

Scaman also said the premise was powerful while noting that Flint remains a prototype and can fail when users push it too far.

That keeps the article's evidence closer to a test of creative variety than a proven enterprise deployment.

Maximilian Weigl, cofounder and chief strategy officer at Uncommon, said his team uses Flint with ChatGPT, Claude and Gemini.

He also said average answers are often good enough and warned against teams copying AI output without human thinking.

Springboards did not disclose Flint pricing, a general launch date, customer numbers, enterprise deployment commitments or independent benchmark results for the prototype.

#large language models #Qwen 3 #Springboards #Flint

Instacart’s Grocery AI Rollout Tests Whether Agents Can Build Baskets Without Breaking Trust

Instacart has rolled out an AI shopping assistant to millions of U.S. customers, with U.S. and Canada expansion planned in the coming months. The assistant turns prompts, photos and deal requests into carts using live inventory from nearly 100,000 stores and data from more than 1.6 billion lifetime orders. The tension is whether larger baskets and personalization can scale while customers still review every decision before checkout.

Japan’s Gennai AI Push Tests Public-Sector Guardrails For Diet Answers

Japan’s government is using its in-house generative AI system Gennai to help prepare Diet answer documents as officials defend the workflow against criticism. Digital Minister Matsumoto said Gennai can identify related systems and past answers, while staff still revise outputs and check facts before material reaches the minister. The practical question is whether the tool reduces late-night bureaucratic work without turning parliamentary answers into unchecked AI output.

SoftBank Drop Shows AI Infrastructure Costs Hitting Asia Tech Stocks

SoftBank Group fell more than 12% as Asian technology shares sold off, with the pressure tied to AI infrastructure costs, Arm weakness and semiconductor price concerns.

Grep Adds LLM Agent To Monito As Online Proctoring Shifts Toward Context Review

Grep said its Monito online proctoring product now uses an LLM agent to analyze context around suspected cheating events. The company cited internal tests showing more than 30 percent shorter post-exam review time and nearly 20 percent fewer false alerts. The key issue is whether agent-based proctoring can improve review efficiency while preserving human final judgment and candidate fairness.