Generative AI NewsAI Tools & Products News

Baidu Unveils ERNIE-4.5-21B A3B Model

At its annual WAVE SUMMIT 2025, Chinese tech giant Baidu introduced a new artificial intelligence model that challenges the idea that “bigger is always better.”

The new system, called ERNIE-4.5-21B A3B Thinking, shows how smart design can sometimes beat raw size.

Key Features 

Long Context Understanding: It can handle up to 128,000 tokens at once, enough to process entire books or lengthy documents.

Baidu achieved this by gradually scaling its training window from 10,000 tokens up to 500,000 while avoiding hardware overload.

Built-in Tool Use: Unlike other models that need add-ons, ERNIE-4.5-21B A3B can naturally call external APIs, run calculations, and connect to databases.

This makes it especially useful for real-world business tasks.

Open-Source License: Baidu released the model under the Apache 2.0 license.

That means companies and developers can use it freely for commercial purposes—without paying costly licensing fees.

What Makes This Model Special

The ERNIE-4.5-21B A3B has 21 billion total parameters, yet it only uses 3 billion active parameters at a time for each token (or unit of text).

This clever setup, known as a Mixture of Experts (MoE) design, makes the model faster and more efficient without sacrificing accuracy.

How it works

Imagine a team of specialists. If you ask the model a math problem, only the “math experts” get activated.

If you ask about writing, the “creative experts” step in.

That targeted use of resources saves computing power and speeds up responses.

How It Was Trained

Baidu built the model using a three-stage process:

Text Pre-Training: The system learned from 2.3 trillion tokens of text. During this stage, its context window grew from 8,000 to 128,000 tokens.

Interestingly, Baidu skipped vision training, keeping the model focused purely on text and reasoning.

Supervised Fine-Tuning: It was trained with 2.4 million examples covering math, coding, science, and logical reasoning.

This step taught the model to work through problems step by step.

Reinforcement Learning: Using a method called Unified Preference Optimization (UPO), Baidu fine-tuned the model’s answers by rewarding logical, clear, and useful responses.

Strong Performance 

Despite being smaller than many competitors, the model’s results are impressive:

Math: Handles multi-step reasoning problems with strong accuracy.

Coding: Performs well on programming benchmarks and can generate and debug code.

Science: Excels at answering technical and scientific questions.

Long-form Writing: Produces accurate, detailed responses even in long outputs.

By delivering this level of performance with fewer parameters, Baidu has proven that architecture and design matter as much as raw size.

Technical Details

28 layers with 20 query heads and 4 key-value heads.

64 total “experts” with 6 active per token.

2 shared experts always in use.

131,072 token context length (~128K tokens).

Supports FP16 and INT8 quantization for efficient deployment.

Accessibility

The model is available right now:

On Hugging Face and Baidu AI Studio for developers.

Through Transformers 4.54+, vLLM, and FastDeploy for deployment.

On ERNIE Bot’s website and mobile app for everyday users.

Via Baidu Cloud services for enterprise customers.

Because it’s under an Apache 2.0 license, anyone—from startups to researchers—can build on top of it without legal or financial barriers.

News Gist

Baidu unveiled ERNIE-4.5-21B A3B, a 21-billion parameter AI model using a Mixture of Experts design.

Despite its smaller size, it delivers powerful reasoning, long-context support, built-in tool use, and comes open-source under Apache 2.0 licensing.

FAQs

Q1. What is Baidu’s ERNIE-4.5-21B A3B model?

It’s a new AI model designed for efficiency, using only 3 billion active parameters per task despite having 21 billion in total.

Q2. When was it announced?

It was announced on September 9, 2025, at Baidu’s WAVE SUMMIT 2025 conference.

Q3. What makes this model different from others?

It uses a Mixture of Experts (MoE) design, enabling faster responses, less computing power, and high accuracy without massive size.

Q4. How much text can it handle?

It supports up to 128,000 tokens per conversation—enough for full books or long documents.

Q5. Is it open-source?

Yes. It’s released under the Apache 2.0 license, making it free for commercial use.

Q6. Where can it be accessed?

It’s available on Hugging Face, Baidu AI Studio, ERNIE Bot, and Baidu Cloud services.

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Binger
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.