Xiaomi tops the world in AI audio reasoning - KooWorldFix

Contents

Revolutionary Reinforcement Learning Approach More Than Purely Recognizing Sound Disrupting Traditional AI Approaches Open-Source Commitment

Xiaomi has achieved a breakthrough in artificial intelligence, and its Big Model team is at the forefront of audio reasoning technology. In a remarkable feat that showcases the company’s growing prowess in frontier AI research, Xiaomi’s latest model has surpassed industry giants like OpenAI and Google on a key audio understanding benchmark.

The team announced via the official Xiaomi Technology account that they’ve topped the world-renowned MMAU (Massive Multi-Task Audio Understanding and Reasoning) assessment list. Their model achieved a record 64.5% accuracy – significantly outperforming OpenAI’s GPT-4o (57.3%) and Google’s Gemini 2.0 Flash (55.6%).

Revolutionary Reinforcement Learning Approach

What is particularly noteworthy about this achievement is the pace at which it was realized. Following the lead of DeepSeek-R1, researchers at Xiaomi stretched reinforcement learning algorithms to multimodal audio understanding tasks and realized this within a period of one week.

The scientists applied the Group Relative Policy Optimization (GRPO) method which allows AI models to learn independently through a “trial and error-reward” mechanism. This mechanism allows the creation of reasoning capabilities that are similar to human reflection and multi-step verification.

Dr. Zhang Wei, head researcher on the project, adds: “Reinforcement learning is especially good at handling a big gap between the generation and the checking of results. Audio reasoning is precisely such a task, where active thinking creates more efficient results than memorizing patterns.”

More Than Purely Recognizing Sound

AI applications today require more than mere sound recognition. Xiaomi’s breakthrough enables AI to:

Determine potential faults in a vehicle by analyzing cockpit recordings
Infer a composer’s mood by listening to musical performances
Anticipate collision risks in crowded places like subway stations

MMAU test set uses 10,000 audio clips from speech to ambient sound to music with human-annotated question-answer pairs to test the model on 27 skills.

Disrupting Traditional AI Approaches

Xiaomi’s experiments produced some surprising findings that challenge conventional AI development wisdom:

Reinforcement learning significantly outperformed supervised learning on a dataset of a mere 38,000 items
Their 7B-parameter model demonstrated superior reasoning ability despite being much smaller than other 100B+ parameter competing models
Forcing the model to generate explicit reasoning processes in fact reduced performance by 3.4%

While 64.5% accuracy is high, it is still less than the 82.23% benchmark of human experts, indicating there is still a lot of room for improvement.

Open-Source Commitment

True to Xiaomi’s philosophy of innovation for everyone, the organization has open-sourced both the training code and the model parameters. With this altruistic act, the firm is allowing developers and researchers across the world to expand upon its innovation.

“By opening up our efforts to the global AI community, we aim to accelerate the process towards true intelligent audio understanding,” Xiaomi founder and CEO Lei Jun said. “This is a further step in our mission to make innovative technology accessible to everyone.”

For those interested in experimenting with this technology:

This breakthrough comes as Xiaomi is introducing AI features across its product lineup, from smartphones to IoT smart home products, and is making the company a serious contender in the global AI research arena.

Source: IT Home

HyperOS Downloader
Easily check if your phone is eligible for HyperOS 2.0 update!

We don’t have 100% rights to the above post and and images please consider visiting the original: Source link

Popular Post

Apple iPhone 16 Plus vs Samsung Galaxy S24 Ultra: In-Depth Comparison

Samsung Galaxy S21 5G Review: Features, Specs, and Performance

Vivo V40 Review: A Focus on Cameras and Design

Samsung Galaxy A04e Review: Display, Performance, Battery, & More

Xiaomi tops the world in AI audio reasoning – KooWorldFix

Revolutionary Reinforcement Learning Approach

More Than Purely Recognizing Sound

Disrupting Traditional AI Approaches

Open-Source Commitment

Leave a Reply Cancel reply

Stay Connected

Categories

Must Read

Apple iPhone 16 Plus vs Samsung Galaxy S24 Ultra: In-Depth Comparison

Samsung Galaxy S21 5G Review: Features, Specs, and Performance

Vivo V40 Review: A Focus on Cameras and Design

Samsung Galaxy A04e Review: Display, Performance, Battery, & More

Create an Amazing Tech News Website

You Might also Like

Global smartphone market rebounds in 2024: Xiaomi Skyrockets – KooWorldFix

Last-minute Samsung Galaxy S25 series leak includes retail box, live photos – KooWorldFix

Oppo led Southeast Asian smartphone market in 2024 – KooWorldFix

Honor 400 series display specs leak, a slightly smaller size is coming – KooWorldFix