ByteDance Launches BAGEL: AI Multimodal Model

ByteDance Launches BAGEL: A Powerful Open-Source Multimodal AI Model

May 24, 2025 Ai Binger News Desk

ByteDance has introduced BAGEL, an open-source multimodal foundation model designed to handle and and generate text, images, and videos.

With 7 billion active parameters (14 billion total), BAGEL demonstrates advanced capabilities in image generation, editing, and complex reasoning tasks.

Key Features

Unified Multimodal Processing: BAGEL can understand and create text, images, and videos all at once.

This makes it useful for tasks like back-and-forth conversations, making pictures, and understanding video content.

Creates and Edits with High Quality: It can produce sharp, realistic images and video frames, and also handle advanced editing like changing styles or creating 3D effects.

World Modeling and Navigation: Trained on large-scale video and web data, BAGEL exhibits capabilities in multi-view synthesis and world navigation tasks, extending beyond traditional image-editing models.

Chain-of-Thought Reasoning: The model enables multi-turn multimodal dialogue and features Chain-of-Thought reasoning, allowing it to generate detailed and logically consistent outputs from short prompts.

Advanced Architecture: BAGEL uses a Mixture-of-Transformer-Experts (MoT) design with two visual encoders to capture detailed and meaningful image features.

It’s trained on large mixed data using a “next token group” prediction method, boosting its ability to understand and generate text, images, and video.

Benchmark Performance

BAGEL outperforms current top-tier open-source vision-language models like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding benchmarks.

Its text-to-image generation quality is competitive with specialized generators such as Stable Diffusion 3.

Additionally, BAGEL demonstrates superior qualitative results in classical image-editing scenarios compared to leading open-source models.

Availability

The BAGEL model is open-sourced under the Apache 2.0 license and is available on GitHub and Hugging Face.

Developers can also experiment with the model via the Replicate platform.

This model costs approximately $0.091 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs.

News Gist

ByteDance has launched BAGEL, a powerful open-source multimodal AI model capable of processing and generating text, images, and videos.

With 7 billion active parameters, it excels in editing, dialogue, and reasoning tasks.

BAGEL outperforms top models like Qwen2.5-VL and is freely available on GitHub, Hugging Face, and Replicate.

Cookie	Domain	Description	Duration	Type
_ga_*	.aibinger.com	Google Analytics sets this cookie to store and count page views.	1 year 1 month 4 days	Analytics
_ga	.aibinger.com	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.	1 year 1 month 4 days	Analytics

AI Binger

ByteDance Launches BAGEL: A Powerful Open-Source Multimodal AI Model

Key Features

Benchmark Performance

Availability

News Gist

Figure AI Introduces Figure 03: New Humanoid Robot

Google Rolls Out Gemini Enterprise

OpenAI Launches ChatGPT Apps SDK — A Full App Platform

Google DeepMind Launches CodeMender

Perplexity Expands with Acquisition of AI Design Startup

Fujitsu and NVIDIA Join Forces to Build “Physical AI” Platform

Leave a Reply Cancel reply