AI News: Microsoft Introduces Magma, a Multimodal AI That Takes Action

February 22, 2025 Ai Binger News Desk

Microsoft has unveiled Magma, a new AI model that integrates visual and language processing to control software interfaces and robotic systems.

Unlike traditional AI models, Magma processes different types of data all at once.

Key Points

Microsoft states that Magma can formulate plans and execute actions to achieve it.

Microsoft claims Magma is the first AI model that not only processes multimodal data (text, images, and video) but also takes actions, such as navigating user interfaces or manipulating physical objects.

According to Microsoft Magma-8B performs competitively across benchmarks, showing strong results like it scored 80.0 on the VQAv2 benchmark and leads with an 87.4 POPE score. It also outperforms OpenVLA in multiple robot manipulation tasks.

Microsoft Magma researcher Jianwei Yang explained that “Magma” stands for “Multimodal Agentic Model at Microsoft Research.”

Next week, Microsoft will release Magma’s training and inference code on GitHub, enabling external researchers to build upon it.

Magma vs. Traditional AI: How Microsoft’s Model Stands Out

Microsoft’s Magma is a unique AI model that integrates perception and control into a single foundation model, unlike previous multimodal systems like Google’s PALM-E and RT-2 or Microsoft’s ChatGPT for Robotics.

Built on Transformer-based LLM technology, Magma extends beyond “verbal intelligence” to include “spatial intelligence” for planning and action execution.

It trains on images, videos, robotics data, and UI interactions, making it a true multimodal agent.

Key features include Set-of-Mark, for identifying interactive objects, and Trace-of-Mark, for learning movement patterns.

News Gist

Microsoft introduced Magma, an AI model that integrates visual and language processing to control software and robotics.

It formulates plans, executes actions, and processes multimodal data simultaneously. Magma-8B excels in benchmarks, outperforming OpenVLA in robotics.

Microsoft will release its training and inference code on GitHub next week for researchers.

Cookie	Domain	Description	Duration	Type
_ga_*	.aibinger.com	Google Analytics sets this cookie to store and count page views.	1 year 1 month 4 days	Analytics
_ga	.aibinger.com	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.	1 year 1 month 4 days	Analytics

AI Binger

AI News: Microsoft Introduces Magma, a Multimodal AI That Takes Action

Key Points

Magma vs. Traditional AI: How Microsoft’s Model Stands Out

News Gist

Leave a Reply Cancel reply