Xiaomi has achieved a breakthrough in artificial intelligence, and its Big Model team is at the forefront of audio reasoning technology. In a remarkable feat that showcases the company’s growing prowess in frontier AI research, Xiaomi’s latest model has surpassed industry giants like OpenAI and Google on a key audio understanding benchmark.
The team announced via the official Xiaomi Technology account that they’ve topped the world-renowned MMAU (Massive Multi-Task Audio Understanding and Reasoning) assessment list. Their model achieved a record 64.5% accuracy – significantly outperforming OpenAI’s GPT-4o (57.3%) and Google’s Gemini 2.0 Flash (55.6%).
Revolutionary Reinforcement Learning Approach
What is particularly noteworthy about this achievement is the pace at which it was realized. Following the lead of DeepSeek-R1, researchers at Xiaomi stretched reinforcement learning algorithms to multimodal audio understanding tasks and realized this within a period of one week.
The scientists applied the Group Relative Policy Optimization (GRPO) method which allows AI models to learn independently through a “trial and error-reward” mechanism. This mechanism allows the creation of reasoning capabilities that are similar to human reflection and multi-step verification.
Dr. Zhang Wei, head researcher on the project, adds: “Reinforcement learning is especially good at handling a big gap between the generation and the checking of results. Audio reasoning is precisely such a task, where active thinking creates more efficient results than memorizing patterns.”
More Than Purely Recognizing Sound
AI applications today require more than mere sound recognition. Xiaomi’s breakthrough enables AI to:
- Determine potential faults in a vehicle by analyzing cockpit recordings
- Infer a composer’s mood by listening to musical performances
- Anticipate collision risks in crowded places like subway stations
MMAU test set uses 10,000 audio clips from speech to ambient sound to music with human-annotated question-answer pairs to test the model on 27 skills.
Disrupting Traditional AI Approaches
Xiaomi’s experiments produced some surprising findings that challenge conventional AI development wisdom:
- Reinforcement learning significantly outperformed supervised learning on a dataset of a mere 38,000 items
- Their 7B-parameter model demonstrated superior reasoning ability despite being much smaller than other 100B+ parameter competing models
- Forcing the model to generate explicit reasoning processes in fact reduced performance by 3.4%
While 64.5% accuracy is high, it is still less than the 82.23% benchmark of human experts, indicating there is still a lot of room for improvement.
Open-Source Commitment
True to Xiaomi’s philosophy of innovation for everyone, the organization has open-sourced both the training code and the model parameters. With this altruistic act, the firm is allowing developers and researchers across the world to expand upon its innovation.
“By opening up our efforts to the global AI community, we aim to accelerate the process towards true intelligent audio understanding,” Xiaomi founder and CEO Lei Jun said. “This is a further step in our mission to make innovative technology accessible to everyone.”
For those interested in experimenting with this technology:
This breakthrough comes as Xiaomi is introducing AI features across its product lineup, from smartphones to IoT smart home products, and is making the company a serious contender in the global AI research arena.
Source: IT Home

HyperOS Downloader
Easily check if your phone is eligible for HyperOS 2.0 update!
