Nvidia's new text-to-3D model shows how quickly generative AI is advancing

Nvidia is on a roll. After unveiling its Blackwell superchip, which is designed to train more powerful AI models like GPT, Claude, and Gemini, it has teased its own AI tool for converting text to 3D (see our guide to the best graphics cards for consumer options).

The graphics card giant closed out GTC week by showcasing LATTE3D, a text-to-3D generative AI model that it describes as a “virtual 3D printer.” It can convert text prompts into 3D representations of objects and animals within one second.

Nvidia says 3D shapes created by LATTE3D “can be easily rendered into virtual environments for video game development, advertising campaigns, design projects, or virtual training spaces for robots.” We've seen text-to-3D conversion tools before, and online praise suggests that some aren't too impressed with the quality of LATTE3Ds' results. But the new model represents a major advance, especially in terms of speed.

Nvidia says it produces 3D shapes almost instantly when running inference on a single GPU, such as the NVIDIA RTX A6000 used in the research demonstration. This means that a creator starting a design from scratch or searching through a library of 3D assets can use LATTE3D to create detailed objects as quickly as they come up with ideas.

The form creates several 3D shape options based on each text prompt. The desired objects can be optimized for higher quality and then exported to graphics software applications or platforms such as Nvidia Omniversewhich enables Description of the global landscape (OpenUSD)3D based workflows and applications.

“A year ago, it would take AI models an hour to create 3D images of this quality — the current state of the art is now about 10 to 12 seconds,” said Sanja Fiedler, vice president of AI research. Producing results much faster, making near real-time 3D text creation accessible to creatives across industries.

Images of dogs generated by the Nvidia LATTE3D AI model

3D dogs created by Nvidia LATTE3D AI model (Image credit: Nvidia)

LATTE3D was developed by Nvidia's Toronto-based AI Lab team and trained using text prompts generated using ChatGPT to improve the model's ability to handle different phrases a user might come up with to describe a given 3D object. While the researchers trained LATTE3D on two specific data sets, animals and everyday objects, the same architecture can be used to train AI on other data types. It remains a research project only and is not available for public use.

wrote AI creator Bilawal Sidhu X: “This is a huge jump. DreamFusion circa 2022 was slow and low-quality, but it launched this generative 3D revolution. Efforts like ATT3D (Autized Object Texture to 3D) chased speed at the expense of quality. Now with high-quality LATTE3D and processes in Less than a second! Which means you can quickly duplicate a 3D world and fill it with text or image to turn it into 3D.

Along with video, 3D is the next frontier for AI image generation. This week, Adobe also announced the integration of its first Firefly AI-based tools into Substance 3D.

Roger Griffith

“Web specialist. Lifelong zombie maven. Coffee ninja. Hipster-friendly analyst.”

Related Posts

Leave a Reply Cancel reply