Chinese tech ByteDance unveils advanced AI model amid battle over TikTok
ByteDance, the parent company of TikTok, has unveiled OmniHuman-1, an advanced AI model capable of generating highly realistic human videos from minimal input, such as a single image and an audio track.
This technology represents a significant advancement in AI-driven human animation.
Key Points
- OmniHuman-1 a diffusion transformer-based framework, integrating audio, video, and pose information to produce lifelike full animations.
- The OmniHuman-1 has been trained on over18,700 hours of human videos, allowing it to produce highly accurate, lifelike human and speech synchronization as mentioned in a report by ABC.
- AI expert Henry Ajder cautioned that OmniHuman1 represents a significant leap forward in deepfake technology.
- Unlike models, which required hundreds or even thousands of images to generate convincing videos, ByteDance’s latest model can achieve astonishingly realistic results from just one image.
- According to, the model’s sophisticated rendering of facial expressions and movements allow for highly convincing impersonations, posing serious risks in areas like disinformation, identity theft, and cyber fraud.
- Although OmniHuman-1 is not yet publicly accessible, its potential is vast ranging from virtual influencers and digital to game and AI-assisted filmmaking.
Concerns Over OmniHuman-1: Deepfakes, Privacy, and Ethical Risks
OmniHuman1, ByteDance’s AI model, raises concerns regarding deepfakes, privacy, and copyright issues.
It could be misused for creating fake videos, committing identity theft, or spreading misinformation.
There are fears about job losses in creative industries and AI bias affecting representation.
Without proper regulations, this technology may exploit unethically.
It could also weaken trust in online, making harder to distinguish between real and fake.
To prevent harm, strong safeguards, ethical guidelines, and responsible AI policies are necessary.
News Gist
ByteDance’s OmniHuman-1 is an advanced AI model that generates lifelike human videos from just one image and an audio track.
Using a diffusion transformer-based framework trained on 18,700+ hours of video, it enables realistic speech synchronization. Experts warn of deepfake risks, including identity theft and disinformation.
Potential applications include virtual influencers, gaming, and AI filmmaking.