Researchers Develop an AI System that Provides Textual and Visual Responses

A team of scientists from Cornell University have developed a multimodal AI that offers a broader and more dynamic range of responses.
Mario L. Major

A team of researchers in the field of artificial intelligence (AI) at Cornell University have developed a deep learning model resulting in a neural network that can not only answer questions but also provide a visual explanation.

In their research paper titled “Multimodal Explanations: Justifying Decisions and Pointing to the Evidence” (which is still pending approval), the team outlines the experiments done with the Pointing and Justification Explanation (PJ-X) model they developed.

Through using a multimodal approach, the team surpassed the limitations imposed by unimodal models, an option which they argue is “offering either image-based visualization of attention weights or text-based generation of post-hoc justifications.”

The team used identical side-by-side images, and after introducing a controlled amount of data to PJ-X, the model could provide explanations that satisfy both activity recognition tasks (ACT-X) and visual question answering tasks (VQA-X).


Unlocking the Mysteries of the Black Box

Though the experiment is rather small in scale, it could have far-reaching implications. The area of AI research and development (R&D) is one of the most exciting areas of scientific research that has emerged in the last decade, and the relative level of healthy competition among countries  — all in pursuit of claiming the golden prize for “World Leader in AI Technologies” is giving birth to projects and research experiments that have been resulting in one breakthrough after another.

With the use of a more dynamic approach to deep learning and algorithms, the research carried out by the team at Cornell University imparts excellent value to the scientific community.

The more ways generated by providing answers, the fewer skeptics there will be pointing to the vague nature of explanations given about how AI works. For example, the cryptic black box — the AI data repository through which data is processed, and from where answers emerge has been shrouded in mystery, with one MIT Technology Review source charging:

Most Popular
“No one really knows how the most advanced algorithms do what they do. That could be a problem.”

Given this complicated issue, as well as the staggering range of applications for AI technology — the area of medicine and health care has come under the most scrutiny, in large part due to the possibility of using deep learning algorithms and deep neural networks to classify and diagnose everything from heart disease to skin cancer. However, the topic of disclosure, arguably one of the most central themes in modern medicine, has evolved parallel to medical science.

Thanks to landmark legislation like the Health Insurance Portability and Accountability Act (HIPAA) of 1996, regulation efforts have been made that ensure that patients and doctors remain mutually informed, regardless of the type of medical technology atmosphere that exists at any given time. Still, we may face the need in the next few years to update disclosure laws to reflect the unprecedented influence of technology in medicine. We must not give in to our suspicions to the point that we stand in the way of vital and very much needed research.

Via: NY Times

message circleSHOW COMMENT (1)chevron