The machine learning firm OpenAI is creating new models capable of improving computer vision — in addition to generating original images from text prompts, according to a recent blog post on the company's official website.
While it's a far cry from displacing visual artists, there's no doubt the slow creep of AI is gaining traction now more so than ever.
OpenAI's AI model can draw images from text prompts
The new AI models from OpenAI represent the cutting-edge in industry efforts to invent new machine learning systems capable of exhibiting recognizable elements of general intelligence while they perform ostensibly simple tasks with direct real-world value — without risking going broke on computer power, Axios reports.
OpenAI announced two novel systems attempting to do for images what the firm's previous GPT-3 model did in 2020 for generating text.
OpenAI's DALL-E similar to GPT-3
The first, called DALL-E, is a neural network the company says can "take any text and make an image out of it," according to OpenAI's chief scientist and co-founder, Ilya Sutskever. This includes concepts it may never have chanced upon during training — like the x-ray images of a cougar sitting in a field displayed above.
DALL-E works sort of like the GPT-3, the giant transformer model capable of generating novel prose based on short prompts.
DALL-E AI lets users tell computers what to do
The other new neural from OpenAI, CLIP, "can take any set of visual categories and instantly create very strong and reliable visually classifiable text descriptions," said Sutskever — with aims to improve existing techniques for computer vision while reducing the need for training and pricey computational power.
"Last year, we were able to make substantial progress on text with GPT-3, but the thing is that the world isn't just built on text," Sutskever added. "This is a step towards the grander goal of building a neural network that can work in both images and text."
OpenAI chose DALL-E's name as a portmanteau of the surrealist artist Salvador Dali and the palpably cute Pixar Studios robot WALL-E. This model is sure to fulfill some sci-fi dreams because it lets users simply tell a computer — via ordinary language — what to create.
OpenAI's CLIP AI pushes efficiency envelope
For example, on OpenAI's blog-post demo, we can enter the prompt: "a can of soda that has the word 'Skynet' written on it," repeat this sentence three times, followed by "skynet brand soda," and the above cluster of imaginary franchises comes to life.
"It can take unrelated concepts that are nothing alike and put them together into a functional object," said leader of the DALL-E team Aditya Ramesh.
CLIP is capable of recognizing images with relatively little training — enabling it to caption images upon the first encounter. But the model's crucial advantage lies in its efficiency — which is gaining a central role as the computational cost of training new machine learning models is spiking like corporate stock amid a global pandemic.
AI models have far to go before outright replacing humans
And like the GPT-3, the new AI models are far from perfect in design. For example, DALL-E is dependent on the phrasing of text prompts — which means the wrong kind of grammar may disrupt its ability to form a coherent image.
While OpenAI's new AI models still have a long way to go before we see them or their likenesses flooding our social media timelines, there's no shame in saying AI with general intelligence is making a slow creep toward the level of creative work typically under the exclusive purview of visual artists, painters, cartoonists, and yes — even writers.