AI models can outperform humans in tests to identify mental states

2 months ago admin

Humans are complicated beings. The ways we communicate are multilayered, and psychologists have devised many kinds of tests to measure our ability to infer meaning and understanding from interactions with each other. AI models are getting better at these tests. New research published today in Nature Human Behavior found that some large language models (LLMs)…

Theory of mind is a hallmark of emotional and social intelligence that allows us to infer people’s intentions and engage and empathize with one another. Most children pick up these kinds of skills between three and five years of age.

The researchers tested two families of large language models, OpenAI’s GPT-3.5 and GPT-4 and three versions of Meta’s Llama, on tasks designed to test the theory of mind in humans, including identifying false beliefs, recognizing faux pas, and understanding what is being implied rather than said directly. They also tested 1,907 human participants in order to compare the sets of scores.

The team conducted five types of tests. The first, the hinting task, is designed to measure someone’s ability to infer someone else’s real intentions through indirect comments. The second, the false-belief task, assesses whether someone can infer that someone else might reasonably be expected to believe something they happen to know isn’t the case. Another test measured the ability to recognize when someone is making a faux pas, while a fourth test consisted of telling strange stories, in which a protagonist does something unusual, in order to assess whether someone can explain the contrast between what was said and what was meant. They also included a test of whether people can comprehend irony.

The AI models were given each test 15 times in separate chats, so that they would treat each request independently, and their responses were scored in the same manner used for humans. The researchers then tested the human volunteers, and the two sets of scores were compared.

Both versions of GPT performed at, or sometimes above, human averages in tasks that involved indirect requests, misdirection, and false beliefs, while GPT-4 outperformed humans in the irony, hinting, and strange stories tests. Llama 2’s three models performed below the human average.

However, Llama 2, the biggest of the three Meta models tested, outperformed humans when it came to recognizing faux pas scenarios, whereas GPT consistently provided incorrect responses. The authors believe this is due to GPT’s general aversion to generating conclusions about opinions, because the models largely responded that there wasn’t enough information for them to answer one way or another.

AI models can outperform humans in tests to identify mental states

Controversial CRISPR scientist promises “no more gene-edited babies” until society comes around

The Download: AI’s math solutions, and brewing beer with sunlight

How our genome is like a generative AI model

You may have missed

Controversial CRISPR scientist promises “no more gene-edited babies” until society comes around

Heres What You Need To Know About The Dominion V. Fox News Trial That Starts This Week

Uber, Lyft win major ruling in California as top court rejects bid to classify drivers as employees

Ford stock plunges nearly 20% in worst day since 2008, following profit miss

Citadel boosts stake in James Dolan’s Sphere weeks after Steve Cohen’s investment

Categories

Useful Links

More Stories

You may have missed

Categories

Useful Links