Google presents Genie: Text to Video Game AI
People & Blogs
Google presents Genie: Text to Video Game AI
Google has introduced Genie, the first generative interactive environment trained in an unsupervised manner using unlabeled internet videos. This model is capable of generating a wide array of controllable Virtual Worlds through text, synthetic images, photographs, and even sketches. With 11 billion parameters, Genie serves as a foundational World model consisting of a spatio-temporal video tokenizer, an auto-aggressive Dynamics model, and a simple and scalable latent action model. Despite being trained without ground truth action labels or other domain-specific requirements typically found in world model literature, Genie allows users to interact within the generated environments on a frame-by-frame basis. Moreover, the learned latent action space enables training agents to replicate behaviors from unseen videos, paving the way for the development of versatile agents in the future.
Keywords:
- Genie
- Generative interactive environments
- Unsupervised training
- Virtual Worlds
- Latent action model
- Behavior imitation
- Generalist agents
FAQ:
What is Genie? Genie is a generative interactive environment developed by Google that is trained in an unsupervised manner using unlabeled internet videos. It can create controllable Virtual Worlds through various mediums such as text, images, and sketches.
What sets Genie apart from other models? Genie's unique aspect lies in its ability to generate environments without the need for ground truth action labels or specific domain requirements typically seen in similar models. This enables users to engage in these environments interactively on a frame-by-frame basis.
How does Genie facilitate agent training? The latent action space learned by Genie allows for the training of agents to imitate behaviors from videos that were not part of the training data. This capability opens up possibilities for developing generalist agents with a wide range of skills and abilities.