AI and Human Perception: Meaning vs. Visual Features

AI and Humans: Differing Perceptions of Objects

While humans concentrate on the meaning of objects, artificial intelligence (AI) focuses on visual characteristics. This fundamental difference, known as 'visual bias,' can affect how we trust and use AI systems.

Understanding the Differences

Florian Mahner from the Max Planck Institute for Human Cognitive and Brain Sciences explains, "These dimensions represent various properties of objects, ranging from purely visual aspects, like 'round' or 'white,' to more semantic properties, like 'animal-related' or 'fire-related,' with many dimensions containing both visual and semantic elements."

Human Focus on Meaning

"Our results revealed an important difference: While humans primarily focus on dimensions related to meaning—what an object is and what we know about it—AI models rely more heavily on dimensions capturing visual properties, such as the object's shape or color," Mahner adds. This means that even when AI appears to recognize objects similarly to humans, it often uses fundamentally different strategies.

Impact of Visual Bias

Martin Hebart, the last author of the paper, explains, "When we first looked at the dimensions we discovered in the deep neural networks, we thought that they actually looked very similar to those found in humans. But when we started to look closer and compared them to humans, we noticed important differences."

Methodology and Findings

For human behavior, the scientists used about 5 million publicly available odd-one-out judgments over 1,854 different object images. For example, a participant would be shown an image of a guitar, an elephant, and a chair and would be asked which object doesn't match. They then treated multiple deep neural networks that can recognize images analogous to human participants and collected similarity judgments for images of the same objects used for humans.

Direct Comparability

They applied the same algorithm to identify the key characteristics of these images—termed "dimensions" by the scientists—that underlie the odd-one-out decisions. By treating the neural network analogous to humans, this ensured direct comparability between the two.

Interpreting Dimensions

In addition to the visual bias identified by the scientists, they used interpretability techniques common in the analysis of neural networks for judging whether the dimensions they found actually made sense. For example, one dimension might feature a lot of animals and may be called 'animal-related.'

To see if the dimension really responded to animals, the scientists ran multiple tests: They looked at what parts of the images were used by the neural network, they generated new images that best matched individual dimensions, and they even manipulated the images to remove certain dimensions. "All of these strict tests indicated very interpretable dimensions," adds Mahner.

Comparing Human and AI Dimensions

"But when we directly compared matching dimensions between humans and deep neural networks, we found that the network only really approximated these dimensions. For an animal-related dimension, many images of animals were not included, and likewise, many images were included that were not animals at all. This is something we would have missed with standard techniques," Mahner explains.

Future Implications

The scientists hope that future research will use similar approaches that directly compare humans with AI to better understand how AI makes sense of the world. "Our research provides a clear and interpretable method to study these differences, which helps us better understand how AI processes information compared to humans. This knowledge can not only help us improve AI technology but also provides valuable insights into human cognition," says Hebart.