ChatGPT can make people think it is a person, but only if it first “acts dumb.” And that’s one of the things that a new study that looked into whether AI models can pass the Turing Test found.
The “dumbing down” question on X was brought up by Charbel-Raphaël Segerie, Executive Director of the Centre pour la Sécurité de l’IA (CeSIA). Experts from UC San Diego wrote it and put it out in a pre-print study paper.
Alan Turing was the first person to suggest the Turing Test. In it, a third party talks to both an AI and a human and decides which is the human. It wasn’t a three-way conversation on this new test; instead, it was a set of one-on-ones.
How the researchers tested for humans
Real people were picked out as humans 67% of the time, and ChatGPT running GPT-4 was picked out as human 54% of the time. This statistically beats the Turing Test.
That being said, the team had to first tell ChatGPT to pretend to be someone who uses slang and misspells words. ChatGPT might get better in the future, so the AI might be able to figure out what it needs to “dumb down” on its own.
People took part in the study, which was led by Cameron Jones and Benjamin Bergen of UC San Diego 500. They were split into five groups. Two groups were made up of human witnesses and interrogators who were asking the AIs and human witnesses.
Three AI systems were used: OpenAI’s GPT-4 and GPT-3.5, as well as ELIZA, a simple chatbot from the 1960s that was built on rules.
AIs were given a delay in their answers to make them look like people who are thinking and typing. If the interviewer thought the witness was human or AI after each conversation, they had to say how confident they were in their choice.
Cameron Jones and Benjamin Bergen set up a personality for GPT-4 and told it to act like it doesn’t care and can’t spell in order to help the AI win.
It was told to “adopt the persona of a young person who is brief, doesn’t take the game seriously, uses slang, and spells words wrong sometimes.” Also, “Not too knowledgeable, especially about math and languages.” To “act dumb” is what it means.
What were the results of the study?
It wasn’t as easy for humans to pretend to be humans as you might think. In the study, only 67% of interrogators correctly identified the human witnesses.
54% of interrogators thought GPT-4 was a person instead of an AI, which was better than predicted and higher than the statistical average. GPT-3.5 also just passed the test with a score of 50%, which is about the same as the overall average.
The basic chatbot was ELIZA, a set of directions from 1960 that was made to sound like a conversation between two people. Only 22% of the time was it confirmed to be a person.
The results show that GPT-4 can successfully mimic a human conversation to the point where it can fool a real person. It’s not as good as people yet, but it’s getting better. It would be interesting to see similar research on Claude Sonnet 3.5, Google Gemini Pro 1.5, and even OpenAI’s own GPT-4o, since these computers are better at talking.
A similar study that only looked at Claude 3 found that it was almost as good as a person at getting people to change their minds.