New Stable Diffusion 2.0 improves jaw-dropping capability for generating AI images
London and San Francisco–based Stability AI, the company that developed Stable Diffusion, an image-generating open-source AI software, has just announced the release of Stable Diffusion 2.0, as per a press statement on the company's website.
The company's new open-source offering provides new features and improvements over the 1.0 release, including new text-to-image models trained on a new encoder called OpenCLIP that improves the quality of the generated images.
What is Stable Diffusion?
Stability AI runs a cluster of more than 4,000 Nvidia A100 GPUs in Amazon Web Services (AWS). It uses these to train AI systems, such as Stable Diffusion, to generate impressive AI art using only text prompts. Stable Diffusion runs similarly to DALL-E 2 by Open AI and Designer by Microsoft.
The company's servers require a massive amount of power. According to Business Insider, Stability AI's operations and cloud expenditures are over $50 million. Still, the firm's CEO, Emad Mostaque, has claimed it will continuously improve the efficiency of these models to reduce expenditure. The company recently raised $101 million in a seed funding round.
At the time of that funding round, Mostaque said, "AI promises to solve some of humanity's biggest challenges. But we will only realize this potential if the technology is open and accessible to all."
However, the company has faced some backlash as its open-source release has led to the propagation of AI-generated graphic content, including violent and pornographic imagery, sometimes involving real people. Stable Diffusion's new system aims to shift the focus with its "brand new possibilities for creative applications."
What's new in Stable Diffusion 2.0?
Stable Diffusion 2.0 uses a new encoder called OpenCLIP, which was developed in collaboration with non-profit machine learning firm LAION. This improves the quality of Stable Diffusion's generated images, allowing for default resolutions of 512×512 pixels and 768×768 pixels. Take a look at a few examples of these images below. Meanwhile, the release notes for Stable Diffusion 2.0 can be viewed here on GitHub.
The 2.0 version also includes an Upscaler Diffusion model that enhances image resolution by a factor of four. In the image below, this model has been used on an image with a 128x128 resolution to upscale it to a higher resolution of 512x512. Stable AI said this model will allow its system to generate images with resolutions of 2048x2048 and higher.
Stability AI also explains in its press release that its new stable diffusion model, depth2img, "extends the previous image-to-image feature from V1 with new possibilities for creative applications." The Depth2img model "infers the depth of an input image ... and then generates new images using both the text and depth information."
The original Stable Diffusion V1 was a game-changer for open-source generative AI art. As Stability AI points out, the previous version had one of the highest climbs to 10,000 GitHub stars for open-source software, earning 33,000 stars in under two months. The new version also provides a new NSFW filter, meaning Stability AI aims to reduce the controversy surrounding its system, hopefully allowing the creative possibilities of it machine learning systems to flourish.