Chatbots and the Turing Test: Are Al Chatbots Getting Smarter Than Us?

A Google study showed its medical LLM pilot “AMIE” outperforming primary care doctors on 28 out of 32 characteristics in a test somewhat akin to the Turing Test. 

An Associate Professor at Wharton, Ethan Mollick, shared a post with the details beginning: “a provocative study from Google where LLMs passed a Turing Test, of a sort, for doctors.” following Google’s publication. 

Let’s find out exactly what the Turing Test is and what Google’s study of AMIE discovered. 

What is the Turing Test?

The “Turing” test, developed by Alan Turing in 1950, is an assessment of a machine’s ability to behave indistinguishably from a human. The test was originally called the imitation game and involves an interrogator trying to determine which of two players is a computer and which is a human by comparing the player’s responses to questions. 

Turing published the first paper focusing entirely on machine intelligence, “Computing Machinery and Intelligence” (1950).

The Turing test has been influenced and criticized, but it’s still a pivotal concept for AI. “ELIZA,” a program built by Joseph Weizenbaum in 1966, has been argued by some to have passed the Turing test. In 2014, a chatbot called “Eugene Goostman” was reported to have convinced a third of judges in a Turing Test competition that it was a 13-year-old boy; some also consider this a pass. Generally, there’s not yet a universal consensus that the Turing test has been surpassed, despite recent advances in generative AI technologies. 

Did “AMIE” Pass the Turing Test?

Mollick shared his post on LinkedIn following Google’s study published on the Google Research blog on January 12, 2024. 

“149 actors playing patients texted live with one of 20 primary care doctors or else Google’s new medical LLM, AMIE. Specialist human doctors & the “patients” rated the quality of care. AMIE beat the primary care doctors on 28 out of 32 characteristics, and tied on the other four, as rated by human doctors. From the perspective of the “patients,” the AI won on 24 of 26 scales.” (Ethan Mollick, Associate Professor at The Wharton School).

(Image Source: Ethan Mollick/Google Research) 

In its study, Google describes AMIE as “A research AI system for diagnostic medical reasoning and conversations.” It says:

“Recent progress in large language models (LLMs) outside the medical domain has shown that they can plan, reason, and use relevant context to hold rich conversations. However, there are many aspects of good diagnostic dialogue that are unique to the medical domain.”

Clinicians take a complete clinical history, ask “intelligent questions,” and “wield considerable skill” to make diagnoses, foster patient relationships, and make decisions with the patient. The tech giant says that AMIE (Articulate Medical Intelligence Explorer) was developed because there’s “been little work specifically aimed towards developing these kinds of conversational diagnostic capabilities.”

To test AMIE, Google developed its pilot evaluation rubric and a “randomized, double-blind crossover study of text-based consultations with validated patient actors interacting either with board-certified primary care physicians (PCPs) or the AI system optimized for diagnostic dialogue.”

AMIE reportedly performed “at least as well as PCPs when both were evaluated along multiple clinically-meaningful axes of consultation quality.” But Google qualifies AMIE’s limitations as a “first exploratory step,” saying its evaluation technique likely underestimates the real-world value of human conversations. 

The tech giant isn’t boasting that AMIE passed the Turing test, but the results are certainly interesting, further illustrating the rapid progress of generative AI and LLMs. 

When Will the Turing Test AI Milestone be Passed?

Mustafa Suleyman is a co-founder of DeepMind, now a division of Google and also the found of Inflection.ai. Of the Turing test, he says:

“It’s totally unclear whether this is a meaningful milestone or not. It doesn’t tell us anything about what the system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence.”

Suleyman argues the 70-year-old Turing test should be replaced. He suggests a test for “artificial capable intelligence,” ACI, for programs that can set goals and achieve complex tasks with minimal intervention. The AI expert expects AI will pass this threshold within the next two years and that the consequences for the world economy are “seismic.”

DeepMind co-founder and chief AGI scientist for Google, Shane Legg, predicts there’s a 50% chance that AGI (Artificial General Intelligence) will be developed by 2028, per a Time article discussing “When Might AI Outsmart Us?”

Anthropic co-founder and CEO Dario Amodei expects “human-level” AI in two to three years. OpenAI CEO Sam Altman says AGI is achievable in the next four to five years. 

An AI Impacts survey of 1,712 AI experts asked when they thought AI would be able to accomplish tasks better and more cheaply than humans, and the results were not as optimistic as some AI leaders. 

https://time.com/6556168/when-ai-outsmart-humans/

Of course, it’s not clear when the Turing test will be universally considered passed or when ACI or AGI will be determined. However, with AI spending surpassing hundreds of billions, the race between developers is certainly underway, and it is highly likely that LLMs will evolve at least as quickly in 2024 as they did in 2022 and 2023.

For more about the future of AI, try our 2024 Prediction Series Wrap-Up: Our Top 7 AI Predictions for 2024. Or read The Future Unveiled: 5 AI-Driven Employment Opportunities Soon to Emerge.

Build a Custom GPT for your business, in minutes.

Deliver exceptional customer experiences and maximize employee efficiency with custom AI agents.

Trusted by thousands of organizations worldwide

Related posts

1 Comment


Avatar photo
rice purity test
July 10, 2024 at 2:33 am
Reply

It is amazing to see the remarkable advances in artificial intelligence, to the point where it can surpass family doctors in many aspects. This shows that AI is developing at a breakneck pace, and may soon pass the “Turing Challenge” – an important milestone to be considered to have achieved artificial intelligence.


Leave a reply

Your email address will not be published. Required fields are marked *

*

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.
Automate customer service.
Streamline employee training.
Accelerate research.
Gain customer insights.

Try 100% free. Cancel anytime.