Alibaba AI Unveils Qwen3-Max Preview
Alibaba’s Qwen team has unveiled the highly anticipated Qwen3-Max Preview (Instruct), a large language model boasting over 1 trillion parameters,making it the largest in the Qwen family to date.
Qwen3-Max Preview is designed for a wide variety of use cases including advanced research, complex coding assistance, long document analysis, multilingual natural language processing, and deployment within AI-augmented agent workflows.
Key Features
Parameters and Architecture: Qwen3-Max Preview boasts more than 1 trillion parameters, cementing its place among the largest publicly known language models worldwide.
This extensive size enables the model to capture complex linguistic patterns and provide superior performance across a wide range of tasks.
Unlike some reasoning-dedicated models, Qwen3-Max employs a non-reasoning architectural approach but delivers substantial improvements in mathematical, programming, and scientific reasoning tasks through architectural optimizations and extensive training.
Context Window: A standout technical feature of Qwen3-Max is its massive context window, supporting up to 262,144 tokens in total, segmented as 258,048 tokens for input and 32,768 tokens for output.
This unprecedented context length facilitates handling long documents, codebases, and sustained conversations or agent runs without losing coherence.
The model further enhances efficiency through context caching, which speeds up repeated interactions and reduces computational costs in multi-turn sessions.
Multilingual and Functional Capabilities: With support for over 100 languages, Qwen3-Max shines in multilingual understanding, particularly excelling in Chinese-English language tasks.
Its application scope includes general knowledge queries (as benchmarked on SuperGPQA), mathematical problems (AIME25), coding challenges (LiveCodeBench v6), reasoning alignment (Arena-Hard v2), and all-around capabilities (LiveBench).
In internal benchmarks, Qwen3-Max outperforms previous Alibaba models like Qwen3-235B-A22B-2507 and competes strongly against renowned models such as Claude Opus 4 and DeepSeek-V3.1.
Performance Highlights: Excels in accuracy and reasoning for math, coding, logic, and science domains.
Demonstrates improved instruction-following skills, resulting in enhanced conversational experience.
Optimized for retrieval-augmented generation (RAG) and tool calling without requiring explicit “thinking” modes.
Employs Mixture of Experts (MoE) design, activating only a fraction of parameters per token, enabling high capacity without linear increases in compute demand.
Benchmarking edge:
Consistently surpasses Alibaba’s earlier Qwen3-235B-A22B-2507 and holds competitive parity with Claude Opus 4, Kimi K2, and Deepseek-V3.1 across demanding benchmark suites like SuperGPQA, AIME25 (math), LiveCodeBench v6, Arena-Hard v2, and LiveBench.
Accessibility
The Qwen3-Max Preview made its debut on Alibaba Cloud’s Bailian platform and is also directly accessible via the Qwen Chat interface, Alibaba Cloud API, OpenRouter, and Hugging Face’s AnyCoder service.
This widespread availability allows developers, enterprises, and AI enthusiasts to explore and integrate the model’s cutting-edge capabilities instantly.
Notably, Qwen Chat supports free usage of this preview model, broadening access to advanced AI technology.
Pricing and Commercial Strategy
Alibaba Cloud has structured a tiered pricing model for Qwen3-Max Preview based on the number of input tokens used, promoting cost-efficiency for smaller workloads while scaling for extensive tasks:
- 0 to 32K tokens: $0.861 per million input tokens, $3.441 per million output tokens.
- 32K to 128K tokens: $1.434 per million input tokens, $5.735 per million output tokens.
- 128K to 252K tokens: $2.151 per million input tokens, $8.602 per million output tokens.
This tiered approach encourages use across diverse applications, from lightweight queries to complex document processing, while contextual caching further reduces costs when repeated input prefixes are encountered.
Future Outlook
While Qwen3-Max Preview currently offers a glimpse into Alibaba’s most powerful AI to date, the company is actively working on the official commercial release and additional enhancements.
Its rollout reflects Alibaba’s ongoing investment in large-scale AI research and development, signaling that despite industry trends, the quest for ever-larger, more capable models remains alive and competitive.
News Gist
Google AI has launched EmbeddingGemma, a compact 308M-parameter multilingual embedding model built for on-device, offline AI tasks.
Offering privacy-first design, state-of-the-art performance, and open licensing, it enables efficient local search, RAG workflows, and intelligent agents without relying on cloud connectivity.
FAQs
Q1. What is Google’s EmbeddingGemma?
A lightweight, multilingual embedding model optimized for on-device and offline AI tasks.
Q2. When was EmbeddingGemma announced?
It was announced on September 4, 2025.
Q3. What are its standout features?
Compact 308M size, multilingual support (100+ languages), sub-200MB RAM usage, 2,000-token context, and privacy-first offline operation.
Q4. How does it perform?
EmbeddingGemma delivers under 15ms inference on Edge TPU and leads the MTEB benchmark among small open models.
Q5. What is its cost and license?
It’s free to use under an open-weight, permissive license for research and responsible commercial use.
Q6. Where can developers access it?
On Hugging Face, Kaggle, Vertex AI, and Google AI docs, with integration into LangChain, LlamaIndex, and Weaviate.