AI Tools & Products News

AI News: Microsoft Introduces Magma, a Multimodal AI That Takes Action

Microsoft has unveiled Magma, a new AI model that integrates visual and language processing to control software interfaces and robotic systems.

Unlike traditional AI models, Magma processes different types of data all at once.

Key Points

Microsoft states that Magma can formulate plans and execute actions to achieve it.

Microsoft claims Magma is the first AI model that not only processes multimodal data (text, images, and video) but also takes actions, such as navigating user interfaces or manipulating physical objects.

According to Microsoft Magma-8B performs competitively across benchmarks, showing strong results like it scored 80.0 on the VQAv2 benchmark and leads with an 87.4 POPE score. It also outperforms OpenVLA in multiple robot manipulation tasks.

Microsoft Magma researcher Jianwei Yang explained that “Magma” stands for “Multimodal Agentic Model at Microsoft Research.”

Next week, Microsoft will release Magma’s training and inference code on GitHub, enabling external researchers to build upon it.

Magma vs. Traditional AI: How Microsoft’s Model Stands Out

Microsoft’s Magma is a unique AI model that integrates perception and control into a single foundation model, unlike previous multimodal systems like Google’s PALM-E and RT-2 or Microsoft’s ChatGPT for Robotics.

Built on Transformer-based LLM technology, Magma extends beyond “verbal intelligence” to include “spatial intelligence” for planning and action execution.

It trains on images, videos, robotics data, and UI interactions, making it a true multimodal agent.

Key features include Set-of-Mark, for identifying interactive objects, and Trace-of-Mark, for learning movement patterns.

News Gist

Microsoft introduced Magma, an AI model that integrates visual and language processing to control software and robotics.

It formulates plans, executes actions, and processes multimodal data simultaneously. Magma-8B excels in benchmarks, outperforming OpenVLA in robotics.

Microsoft will release its training and inference code on GitHub next week for researchers.

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Binger
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.