What is DALL-E? How it works and how the system generates AI art
In an ever-changing age where artificial intelligence technology is quickly advancing, DALL-E is keeping up with the times.
What is DALL-E?
DALL-E, written as DALL·E on the company website, is a machine-learning model created by OpenAI to produce images from language descriptions. These text-to-image descriptions are known as prompts. The system could generate realistic images just from a description of the scene. DALL-E is a neural network algorithm that creates accurate pictures from short phrases provided by the user. It comprehends language through textual descriptions and from “learning” information provided in its datasets by users and developers.
The system uses a transformer-based neural network, a type of machine learning that understands context and processes sequences, to create new images that accurately represent each text prompt. DALL-E is constantly training and updating its datasets so that the transformer can correctly predict the images from text prompts.
How does it work?
DALL-E can imaginatively generate images based on words provided by the creators and artists, even in the case of the most unique and unusual descriptions.
How does it produce the art? It uses the algorithm within the words and places them in a series of vectors or text-to-image embeddings. Then, the AI creates an original image from the generic representation it was presented with from its datasets, based on text added by the user creating the art. DALL-E can "take any text and make an image out of it,” said Ilya Sutskever, co-founder and chief scientist at OpenAI.
The advanced datasets, combined with deep learning, a type of machine learning, allow DALL-E to create new art. It takes the image embeddings and generates an actual image. The AI can also appropriately add in slight details, like shadows and reflections, to give images an even-more realistic look.
Background information on the history of OpenAI
Before the company was creating innovative text-to-image machine learning concepts through DALL-E, it started out as a text generator, more specifically a language processor. In 2019, OpenAI had initially created a model called the GPT-2 that could predict the next word within a text. It had 1.5 billion parameters and was trained on 8 million web pages to produce its data set. The goal was to predict the next word, similar to a text-to-text generator. “On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data,” OpenAI stated. Its successor, the GPT-3 model, would become the preliminary model for DALL-E, altered to generate images instead of additional text.
How the name was formed
How did the creators at this company come up with the name DALL-E? The name is a combination of the artist Salvador Dali and the robot WALL-E from Pixar. Combining both art and digital animation using artificial intelligence, the company’s system DALL-E is leaving its mark in the world of AI.
Safety features on DALL-E
The company continues to work on safety and security features within its system. “We’ve enhanced our safety system, improving the text filters and tuning the automated detection & response system for content policy violations.” The improvements also help prevent people from creating images that are violent or harmful by removing the content from the machine learning datasets. “We’ve limited the ability for DALL·E 2 to generate violent, hate, or adult images. By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts, the company stated. “We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.”
OpenAI also created an application called Moderation endpoint that allows developers to protect their applications against misuse. It protects users by assessing if the content is dangerous. “The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications,” the company mentioned. OpenAI proved this endpoint to all OpenAI API (application programming interface) account holders to allow for a “safer AI ecosystem”.
To ensure the AI is not being misused, OpenAI will not generate imagers if the filter identifies text prompts and image uploads as violating its policies.
No more waitlist
In July 2022, DALL-E entered a beta phase, or initial development and testing stage, sending invitations to one million people on its waitlist. In Sept. 2022, OpenAI removed the waitlist for DALL-E, allowing users to sign up for the software immediately. Prior to the announcement, the company already had many users generating art. “More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.” With so many users already creating images, along with having a safety approach, DALL-E was ready to be deployed to anyone who was interested.
DALL-E and CLIP
DALL-E was revealed around the same time as its other neural network Contrastive Language-Image Pretraining (CLIP). This model is separate from DALL-E and was trained with 400 million pairs of images that had previously removed text from those images. Its connection to DALL-E was to comprehend and rank DALL-E’s output by guessing which caption, selected from among thousands, would be most acceptable for the image. CLIP would create text descriptions for images generated by DALL-E software. The method of DALL-E is called the inverted clip, or unCLIP, because it does the opposite of what CLIP does, by generating images from text instead of making text from images.
Difference between DALL-E and DALL-E 2
Although both DALL-E, announced in January 2021, and DALL-E 2, revealed in April this year, are models created by OpenAI, the difference lies in the number of parameters, allowing DALL-E 2 to create even better images than DALL-E. This is done by generating higher-resolution images. DALL-E uses 12-billion parameters, while DALL-E 2 works on 3.5-billion parameters, with an additional 1.5 billion parameters to enhance the resolution.
DALLE-2 creates images of higher resolution, albeit smaller than its predecessor. DALLE-2 has also “learned the relationship between the pictures and text used to describe them in a process also known as diffusion. In the method, there is usually a pattern of dots that gradually alters itself toward an image when it recognizes aspects of that image. DALL-E 2 can expand images beyond what’s in the original photo, called outpainting, creating new compositions from old images. It has four times greater resolution than DALL-E. Overall, DALL-E 2 is more versatile and produces more realistic and accurate images than its precursor.
Outpainting as a novel feature
In August 2022, OpenAI introduced a unique new feature called outpainting to DALL-E 2. This allows users to continue to create an image beyond the original borders, taking visual elements in a new direction, just through natural language description. This added feature was a nice balance to OpenAI’s previous edit feature in DALL-E called inpainting, which allows users to change a generated image. The new feature lets creators make large-scale images by adding the extension. With the new process, developers at AI get a better understanding of DALL-E’s different strengths and capabilities.
Creative and commercial use of DALL-E
According to the company website, images can be used creatively and commercially. It mentioned that people can make images in the software and use it for commercial projects, like in book illustrations or for company websites. This allows creators to get full usage rights of the images they generate, according to OpenAI. Some developers think that there should be regulations for AI-generated art. "Such regulations could take many forms like including watermarks in the images generated by DALL-E, making DALL-E a paid software, developing government policies around enforcing fair use of such content, or some combination of all these things," said Rishabh Misra, a senior machine learning (ML) engineer at Twitter and an independent ML researcher
Although that puts a lot of questions out there regarding copyright or stock image credit, some companies, such as Shutterstock, are incorporating AI-generated imagery, and see it as a step forward in the right direction for the ever-evolving future of AI and content creation.
The future possibilities are endless
There are numerous opportunities and possibilities for using AI-generated art, such as DALL-E, in content creation. One idea is to use AI-generated images for concepts not yet created or too costly to photograph. "People could use DALL-E to generate an image of a product that doesn't exist yet, or create an image that is too expensive or difficult to photograph," suggested Ed Shway, co-owner of AI company ByteXD.
Some people think that there will be multiple artificial intelligence tools combined to create moving, talking, fully animated art. "What is really exciting is that as the creative reality space progresses, we’re seeing that people are layering different AI tools to produce even more creative content. An image of a person created on DALL-E can be animated and given voice using D-ID (AI-generated text-to-video). A landscape created in Dreamstudio can turn into an opening shot of a movie, accompanied by music composed on Jukebox," said Gil Perry, CEO and co-founder of D-ID, a creative AI and patented video reenactment technology company.