AI Academy
Posts
📚 Why AI Can Now Talk Like Us

📚 Why AI Can Now Talk Like Us

How OpenAI's new tech brings human-like dialogue to life

October 15, 2024

Reading Time: 4 minutes

Hello AI Enthusiast,

OpenAI has recently released the highly anticipated Advanced Voice Mode. Today, we’re explaining to you, from a technological point of view but in simple language, why it’s an ingenious advancement.

Important Note: As of now, this feature is not available in Europe unless you're on a team plan. And if you don't have access yet, we'll get you prepared for when it rolls out more widely!

Have you tried OpenAI's new Advanced Voice Mode yet?

If you've used it, leave a comment after voting to share what you used it for.

The Problem

Current AI voice interactions fall short in two crucial ways. First, they sound robotic, failing to capture the nuances of human speech like tone and emotion.

Second, there's often a delay between user input and AI response, breaking the natural flow of conversation.

These limitations make AI interactions feel unnatural, especially in applications like customer service where fluid dialogue is essential.

Old vs. New

Imagine the old AI voice system as a person who learned to communicate solely through reading, without the ability to hear or speak. They can understand words and form sentences, but they've never heard the richness of human speech, the ups and downs, the excitement, the sarcasm.

The old AI voice model included 3 steps that worked with three different models:

The 3 Steps of the Old AI Voice Model

Whisper: This model acted like ears, converting your voice into text.
ChatGPT: Think of this as the brain, processing the text and formulating a response.
VALL-E: This was the mouth, turning the text response back into speech.

Just like our silent reader who never heard a voice, this system couldn’t recognize emotions in our speech. When you said, ‘You’re kidding, right?’ with excitement, it interpreted it the same way as if you had said it with concern. All the richness of human expression was lost in translation.

The New Way

New Advanced Voice Mode Feature in App

Now, let's flip our metaphor. Instead of our silent reader, imagine someone who grew up listening to and engaging in countless conversations. They've heard laughter, anger, sarcasm, excitement; the whole spectrum of human emotion expressed through voice. That's what Advanced Voice Mode is like.

OpenAI has created a single, powerful model that's been "raised" (trained) on audio data rich with emotional content and nuances of human speech. It doesn't need to convert your voice to text and back again. Instead, it takes in your speech, emotions and all, and responds in kind.

The result is a fast, tone adapting AI conversations that feel closer to chatting with a real person.

If you've read this far, you're keen on AI. On October 23rd, we're sharing a new direction for AI Academy focused on continuous learning, peer connections, and practical solutions. Join Gianluca Mauro's free webinar for details.

How Businesses Can Use This Technology

The possibilities for leveraging this new AI voice technology are vast and varied across industries. Here are a few examples, along with audio samples, to spark your imagination:

1) Customer Service that can understand and respond to customer, providing more empathetic support, even in different languages.

2) Health Assistants that can guide patients through symptom checkers, appointment scheduling, or provide medication reminders.

3) Language tutors that help learners with pronunciation and intonation.

Want to get even more practical? Explore hands-on AI learning with AI Academy:

Generative AI Project Bootcamp: Create AI prototypes to solve real business problems.
Corporate Training: Tailored AI training for your team.
Practical Introduction to ChatGPT: Free course on effectively using ChatGPT.

We'll be back with more AI tips soon!