AI Tools & Products News

ByteDance Launches BAGEL: A Powerful Open-Source Multimodal AI Model

ByteDance has introduced BAGEL, an open-source multimodal foundation model designed to handle and and generate text, images, and videos.

With 7 billion active parameters (14 billion total), BAGEL demonstrates advanced capabilities in image generation, editing, and complex reasoning tasks.

Key Features

Unified Multimodal Processing:  BAGEL can understand and create text, images, and videos all at once.

This makes it useful for tasks like back-and-forth conversations, making pictures, and understanding video content.

Creates and Edits with High Quality: It can produce sharp, realistic images and video frames, and also handle advanced editing like changing styles or creating 3D effects.

World Modeling and Navigation: Trained on large-scale video and web data, BAGEL exhibits capabilities in multi-view synthesis and world navigation tasks, extending beyond traditional image-editing models.

Chain-of-Thought Reasoning: The model enables multi-turn multimodal dialogue and features Chain-of-Thought reasoning, allowing it to generate detailed and logically consistent outputs from short prompts.

Advanced Architecture: BAGEL uses a Mixture-of-Transformer-Experts (MoT) design with two visual encoders to capture detailed and meaningful image features.

It’s trained on large mixed data using a “next token group” prediction method, boosting its ability to understand and generate text, images, and video.

Benchmark Performance

BAGEL outperforms current top-tier open-source vision-language models like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding benchmarks.

Its text-to-image generation quality is competitive with specialized generators such as Stable Diffusion 3.

Additionally, BAGEL demonstrates superior qualitative results in classical image-editing scenarios compared to leading open-source models.

Availability

The BAGEL model is open-sourced under the Apache 2.0 license and is available on GitHub and Hugging Face.

Developers can also experiment with the model via the Replicate platform.

This model costs approximately $0.091 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs.

News Gist

ByteDance has launched BAGEL, a powerful open-source multimodal AI model capable of processing and generating text, images, and videos.

With 7 billion active parameters, it excels in editing, dialogue, and reasoning tasks.

BAGEL outperforms top models like Qwen2.5-VL and is freely available on GitHub, Hugging Face, and Replicate.

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Binger
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.