Google Introduces Chirp 3, AI Model for Voice Generation
Google has announced the addition of “Chirp 3”, its speech-to-text and HD text-to-speech model, to the Vertex AI platform.
Chirp 3 was announced On March 17, 2025, at the “Gemini for the United Kingdom” event in London,
Key Points
- Chirp 3 is an audio generation model that adds a range of custom voices with human-like inflections and intonations.
- Chirp 3 will be available on Vertex AI starting next week, joining other frontier models on the platform such as Gemini, Imagen, and Veo.
- Chirp 3’s HD Voices feature supports 31 languages, providing 248 unique voices with eight speaker options.
- Chirp 3 can be used for tasks like voice annotation, meeting transcription, audiobooks, podcasts, and building AI voice assistants.
- Google will implement usage restrictions to prevent misuse, with its safety team currently reviewing guidelines, according to Google Cloud CEO Thomas Kurian.
Chirp 3 Vs. Whisper and Polly: A Battle of Speech AI Models
Google’s Chirp 3 competes with leading speech AI models like OpenAI’s Whisper, Microsoft’s Azure Speech, and Amazon Polly.
While Whisper focuses on accurate multilingual speech recognition, Azure Speech offers real-time transcription and translation.
Amazon Polly is known for generating lifelike voice outputs.
Chirp 3 stands out with its HD Voices feature, supporting 31 languages and 248 customizable voices.
Its human-like intonations and versatile applications, from voice assistants to podcasts, give it an edge in realistic audio generation.
News Gist
Google announced Chirp 3, its speech-to-text and HD text-to-speech model, for Vertex AI.
Supporting 31 languages and 248 voices, it enables tasks like transcription, podcasts, and voice assistants, with safety measures in place to prevent misuse.