Stable Diffusion 3.5 Is Out with Better Performance & the Right Number of Limbs

Tuesday, 22 October 2024

8 Hits

Stability, the AI startup behind popular image generator Stable Diffusion, has announced a new generation of models that are designed to be more customizable and versatile while outputting more "diverse" and representative content.

The new Stable Diffusion 3.5 is being made available in three variations: Large, Large Turbo, and Medium. Stable Diffusion 3.5 Large clocks in at 8 billion parameters and is capable of generating images up to a megapixel in resolution. Stable Diffusion 3.5 Large Turbo has the same capabilities as Large, though it generates images more quickly at the cost of some quality.

Stable Diffusion 3.5 Medium, on the other hand, is designed to run on edge devices, like laptops, tablets and smartphones. It can generate images in a variety of sizes from a quarter MP up to 2 MP. The Large and Large Turbo versions are available today, while Medium will arrive at the end of the month, on October 29.

SD 3.5 promises significantly improved performance over its predecessor, Stable Diffusion 3.0, which made its debut in June. SD3's release was met with outrage and disappointment by its user base due to the model's inability to accurately render limbs and fingers, especially for subjects depicted lying on grass, as well as rampant artifacts and its difficulty following the given prompt.

The company concedes that SD3 did not "fully meet our standards or our communities’ expectations" but insists that "Stable Diffusion 3.5 reflects our commitment to empower builders and creators with tools that are widely accessible, cutting-edge, and free for most use cases."

examples of SD 3.5's more diverse image generation capabilities

Stability AI

The new 3.5 model has also been trained to generate more diverse images of people, creating "images representative of the world, not just one type of person, with different skin tones and features," and without the need for extensive prompts.

“During training, each image is captioned with multiple versions of prompts, with shorter prompts prioritized,” Hanno Basse, CTO of Stability, told TechCrunch. “This ensures a broader and more diverse distribution of image concepts for any given text description. Like most generative AI companies, we train on a wide variety of data, including filtered publicly available datasets and synthetic data.”

Stability AI

Whether this leads to the same sorts of issues that Google faced in February when Gemini's image generator started spitting out "historically inaccurate" (read: black Nazi soldiers), remains to be seen. “Gemini’s AI image generation does generate a wide range of people,” a Google statement posted to Twittter in February, reads. “That’s generally a good thing because people around the world use it. But it’s missing the mark here.”

If you'd like to try Stable Diffusion 3.5 for yourself, both Large and Large Turbo are currently available for download from Hugging Face. You can find the inference code on GitHub.

Source: Stability AI, TechCrunch

Original link

(Originally posted by Andrew Tarantola)