AI and Human Perception: Meaning vs. Visual Features
While humans focus on the meaning of objects, AI relies more on visual characteristics, leading to a significant 'visual bias' in AI models.
AI and Humans: Differing Perceptions of Objects
While humans concentrate on the meaning of objects, artificial intelligence (AI) focuses on visual characteristics. This fundamental difference, known as 'visual bias,' can affect how we trust and use AI systems.
Understanding the Differences
Florian Mahner from the Max Planck Institute for Human Cognitive and Brain Sciences explains, "These dimensions represent various properties of objects, ranging from purely visual aspects, like 'round' or 'white,' to more semantic properties, like 'animal-related' or 'fire-related,' with many dimensions containing both visual and semantic elements."
Human Focus on Meaning
"Our results revealed an important difference: While humans primarily focus on dimensions related to meaning—what an object is and what we know about it—AI models rely more heavily on dimensions capturing visual properties, such as the object's shape or color," Mahner adds. This means that even when AI appears to recognize objects similarly to humans, it often uses fundamentally different strategies.
Impact of Visual Bias
Martin Hebart, the last author of the paper, explains, "When we first looked at the dimensions we discovered in the deep neural networks, we thought that they actually looked very similar to those found in humans. But when we started to look closer and compared them to humans, we noticed important differences."
Methodology and Findings
For human behavior, the scientists used about 5 million publicly available odd-one-out judgments over 1,854 different object images. For example, a participant would be shown an image of a guitar, an elephant, and a chair and would be asked which object doesn't match. They then treated multiple deep neural networks that can recognize images analogous to human participants and collected similarity judgments for images of the same objects used for humans.
Direct Comparability
They applied the same algorithm to identify the key characteristics of these images—termed "dimensions" by the scientists—that underlie the odd-one-out decisions. By treating the neural network analogous to humans, this ensured direct comparability between the two.
Interpreting Dimensions
In addition to the visual bias identified by the scientists, they used interpretability techniques common in the analysis of neural networks for judging whether the dimensions they found actually made sense. For example, one dimension might feature a lot of animals and may be called 'animal-related.'
To see if the dimension really responded to animals, the scientists ran multiple tests: They looked at what parts of the images were used by the neural network, they generated new images that best matched individual dimensions, and they even manipulated the images to remove certain dimensions. "All of these strict tests indicated very interpretable dimensions," adds Mahner.
Comparing Human and AI Dimensions
"But when we directly compared matching dimensions between humans and deep neural networks, we found that the network only really approximated these dimensions. For an animal-related dimension, many images of animals were not included, and likewise, many images were included that were not animals at all. This is something we would have missed with standard techniques," Mahner explains.
Future Implications
The scientists hope that future research will use similar approaches that directly compare humans with AI to better understand how AI makes sense of the world. "Our research provides a clear and interpretable method to study these differences, which helps us better understand how AI processes information compared to humans. This knowledge can not only help us improve AI technology but also provides valuable insights into human cognition," says Hebart.
Frequently Asked Questions
What is the main difference between how humans and AI perceive objects?
While humans focus on the meaning of objects, AI relies more on visual characteristics, leading to a 'visual bias' in AI models.
What is 'visual bias' in AI models?
Visual bias refers to the tendency of AI models to rely more heavily on visual properties of objects, such as shape and color, rather than their meaning.
How did scientists study the differences between human and AI perception?
Scientists used about 5 million publicly available odd-one-out judgments over 1,854 different object images for humans and treated multiple deep neural networks analogously to human participants to collect similarity judgments.
What techniques did scientists use to interpret dimensions in AI models?
Scientists used interpretability techniques common in the analysis of neural networks, including tests to see what parts of images were used by the neural network and generating new images that best matched individual dimensions.
What are the implications of these findings for AI technology and human cognition?
These findings can help improve AI technology by better understanding how AI processes information and provide valuable insights into human cognition.