ImageBind: Meta pushes AI boundaries, new tool may enable machines to sense like humans

The new AI tool integrates six different types of data to produce multi-sensory content: photos, text, audio, depth, thermal, and IMU data.
Sejal Sharma
Stock photo: Metaverse, artificial intelligence concept.
Stock photo: Metaverse, artificial intelligence concept.

Uladzimir Zuyeu/iStock  

Meta has introduced a new open-sourced AI tool called ImageBind, which combines six kinds of data, namely images, text, audio, depth, thermal, and IMU data, to create multisensory content.

Thermal and IMU mean that the model works by calculating motion and position as well.

The goal of the research team was to create a single joint embedding space for multiple streams of data using images to bind them together. However, it does not need datasets where all modalities co-occur with each other.

ImageBind: Meta pushes AI boundaries, new tool may enable machines to sense like humans
AI learning across six modalities.

The company hasn’t discontinued its strategy of keeping its technology open-source. It had developed in February a "state-of-the-art" AI large language model called LLaMA, which was also open-source.

ImageBind mimics human perception

Meta, the parent company of Facebook, Instagram, and Whatsapp, says that the new machine learning tool brings us one step closer to training an artificial intelligence in how humans learn from their environments through their senses.

ImageBind equips machines with a holistic understanding that connects objects in a photo with how they will sound, their 3D shape, how warm or cold they are, and how they move, said Meta in a statement.

The AI model works by detecting objects in a photo and giving information about the same. For example, ImageBind will provide information on how hot or cold an object in an image will be, what sound it will generate, what its shape will be, and how it will move.

It will also allow content creators to make videos using sounds and movement based on only one or two kinds of data like text, image, or audio. For example, a creator could couple an image with an alarm clock and a rooster crowing, and use a crowing audio prompt to segment the rooster or the sound of an alarm to segment the clock and animate both into a video sequence, said the company in a statement.

Meta also announced that they will soon introduce more streams of data that link as many senses as possible, like touch, speech, smell, and brain fMRI signals. “This will enable richer human-centric AI models,” said the company.

The AI model is a research prototype at this point, and the research team says it cannot be readily used for real-world applications just yet.

Companies ranging from Microsoft to Google are working on developing new AI models and tools to woo their users. Meanwhile, Meta’s chatbot Blenderbot 3 failed to impress as a rival to OpenAI’s ChatGPT, Microsoft’s Bing, and Google’s Bard AI.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board