Alibaba Launches AI Model Qwen-VLo
Alibaba has unveiled Qwen-VLo, its latest multimodal AI model that can both understand images and generate them—similar to OpenAI’s GPT-4o.
What is Qwen-VLo?
Qwen-VLo is a multimodal AI model. That means it can understand and work with both language and visuals.
You can ask it to generate an image from text, edit existing pictures, or describe complex visuals—all in one place.
It is designed for a range of uses: e-commerce, design, education, and even customer support, where visuals and language need to work together seamlessly.
Key features
Step-by-step image creation: Qwen-VLo generates pictures gradually (left to right, top to bottom), allowing better control and fewer errors during edits.
Multiple image inputs: You can combine and manipulate several images at once—useful for designers or e-commerce visuals.
Supports many languages: Goes beyond English and Chinese, making it a versatile tool for global users.
Flexible formats: Trains on dynamic resolutions like 1:1, 16:9, or 4:3—ideal for content creators working across devices.
Whats New
Unlike earlier versions, Qwen-VLo lets users tweak individual parts of a picture—like changing someone’s shirt color—without accidentally altering other parts.
This solves a common issue seen in earlier AI tools.
This model is Alibaba’s clear effort to compete with OpenAI’s GPT-4o and Google’s Gemini.
While GPT-4o excels in speed and voice features, Qwen-VLo focuses more on image quality, fine-tuned edits, and multi-image reasoning.
Availability
Qwen-VLo is available now as a preview in the Qwen Chat interface. While still being tested and improved, it’s already receiving attention for its accuracy, multilingual support, and creative flexibility.
News Gist
Alibaba has launched Qwen-VLo, a powerful multimodal AI model that understands and generates images using text prompts.
It supports editing, multilingual input, and flexible formats, challenging GPT-4o with advanced features like progressive image rendering and multi-image input capabilities.