AI Academy
Posts
🤖 OpenAI and Google Battle Over Smartest AI Assistant

🤖 OpenAI and Google Battle Over Smartest AI Assistant

Plus, Claude’s European availability and IBM’s open-sourced model

May 15, 2024

Hello AI Enthusiast,

Wow, what a week for AI! OpenAI and Google just made some big announcements that are really shaking things up. It seems like they timed their news to one-up each other, and it's not the first time OpenAI has tried to outshine Google's big moments.

First up, OpenAI rolled out its new GPT-4o model. Then, right after that, Google stepped up at their I/O 2024 event showing their upgraded Gemini AI integrated into their products.

Both companies are racing to build an AI that can see, talk, translate, and even help with your homework or guide you through a new city. They're transforming how we interact with technology on a daily basis. It's exciting (and maybe a bit scary) to think about.

The details of these updates are big news, and there's a lot to unpack. Let’s dive in!

The Big Picture 🔊

OpenAI Launches GPT-4o Model

OpenAI has introduced GPT-4o, a new model that can handle text, audio, image, and video inputs at the same time. GPT-4o is faster and 50% cheaper compared to the previous version, GPT-4 Turbo, and supports around 50 languages. This upgrade makes ChatGPT respond quicker, recognize voice tones, and interact more naturally. During the launch and afterward, OpenAI demonstrated several use cases showing the many ways this new way of interacting can be used. Starting today, text and image features are available in ChatGPT for Plus subscribers and also free users, with Plus users getting up to 5 times more messages. Developers can now use GPT-4o via API, and audio and video features will soon be available to trusted partners. Here’s a video showing one of the many usages of the new model.

Interview prep with GPT-4o
— OpenAI (@OpenAI)
6:39 PM • May 13, 2024

Google I/O 2024 Highlights

At Google I/O 2024, Google's CEO talked about big improvements in Gemini. The AI model is now part of key Google services like Search, Photos, and Workspace. The new version, Gemini 1.5 Pro, can now handle up to two million tokens of information, making it the model with the longest context window. Google also introduced Gemini 1.5 Flash, which is a faster, more cost-effective version, and Gemini Nano, a smaller version of the AI with multimodality, that is integrated into Android devices, keeping data private on your device.

Another announcement worth mentioning is Project Astra, a seeing and talking responsive agent showcasing real-time conversational abilities and a deep understanding of different data types. This is a video of Project Astra in action. Any similarities?

💡Our Take: OpenAI's launch of GPT-4o is a big step forward in AI. The "o" in GPT-4o stands for "omni," meaning it can handle text, audio, image, and video at the same time, responding faster and in a very natural way, almost like talking to a human. This makes interactions smoother and opens up many new possibilities, especially through its API, which will expand further once video and audio features become available. The new desktop app is also exciting, offering us a virtual assistant that can see our screen and with whom we can talk. Interestingly, OpenAI is making these features available to free users, aiming for mass use rather than quick profits.

Just one day later, Google responded with major news at their I/O 2024 event. Google's reach, with billions of devices using their systems and apps, means they have a big advantage, even if their AI isn't always the best. Google also showed off Project Astra, an AI assistant that can talk, and understand different types of data in real time, similar to what OpenAI has been demonstrating.

The competition between OpenAI and Google is intense, and the battle will be all about who can nail the best user experience and integrate AI into products the smartest way. Google has the products, OpenAI doesn't, but we wonder if the rumored partnership between OpenAI and Apple is aiming to change that. 👀

How often do you use ChatGPT or similar AI tools?

After voting, share with us whether the latest enhancements will make you use it even more.

Bits and Bobs 🗞️

A new Microsoft and LinkedIn study reveals that 2024 marks a pivotal year for AI in the workplace, with three out of four people using AI tools.
OpenAI's program for preferred publishers aims to partner with media companies through deals and perks outlined in a leaked document.
OpenAI is enhancing digital content authenticity by joining the C2PA Steering Committee and adding metadata to its AI-generated images and videos.
OpenAI is developing a tool called Media Manager to help content creators control how their work is used in AI training.
OpenAI has released the first draft of the "Model Spec," a document outlining how they want AI models to behave in the ChatGPT and API.
Google DeepMind's new AlphaFold 3 achieves unprecedented accuracy in predicting molecular interactions, aiding drug discovery and understanding of biological processes.
Anthropic's AI assistant, Claude, is now available in Europe via the web version, and the Claude iOS app.
Meta introduces AI-generated ad features, allowing advertisers to create image variations and text prompts for different product presentations.
Stability AI has launched Stable Artisan, a user-friendly bot for generating media directly on Discord using Stable Diffusion models.
Cohere has introduced fine-tuning for its Command R model, allowing enterprises to customize AI for specific needs.
Alibaba Cloud has released its latest large language model, Qwen2.5, showcasing significant improvements.
SoundHound AI has partnered with Perplexity AI to enhance its Chat AI voice assistant using Perplexity’s real-time web search capabilities.
IBM is open-sourcing a family of Granite code models aimed at making software development easier.
Microsoft announced its largest-ever investment in France to boost AI and cloud technology adoption, aligning with the French National Strategy for AI.

Educational Pill 💊

Understanding GPT-4o

GPT-4o, or "omni," seamlessly handles text, audio, images, and video, responding in any of these formats. This model mimics human interaction speeds, answering audio queries almost as quickly as we respond in conversations. Imagine chatting with AI that understands not just your words, but the full context of your questions, including tone and visuals.

Previously, interacting with ChatGPT involved slower responses because it converted spoken words to text and back to speech, losing nuances like tone and background sounds. GPT-4o changes this by using a single model that processes everything together, capturing subtle details like emotions and distinguishing between multiple voices. This makes the AI interaction feel much more fluid and intuitive.

From Our Channels 🤳

Check out this TikTok video of Gianluca getting ready to teach a Harvard class how to add your own data to an AI model. It's probably one of the simplest RAG explanations out there.

@gianluca.mauro
How to add your own data to an AI model? Quick explained of retrieval augmented generation from my Harvard class. #ai #learnontiktok #arti... See more

As we've seen, AI models are getting smarter fast. However, simply subscribing to the latest technology isn't enough to ensure your teams can use AI effectively. Access alone doesn't guarantee efficiency; understanding and skill do.

That’s where our corporate training programs come in. If you want your team to get the best out of AI, chat with our partnerships lead, Helin Yontar. She can fill you in on how we can help!

From the Tribe 🫂

Just when you think you've got all your automations perfected, OpenAI rolls out a new model! This week, our students couldn't help but laugh (and maybe groan a little) about having to update their automations yet again. While this means going back to the drawing board, it's a reminder of the fast-paced world of AI where staying updated is just part of the journey.

LOLgorithms 😂

Sundar did the heavy lifting for us.

Of course Google used Gemini to count AI mentions during today's AI-filled #GoogleIO. And there was even one more after this.
— TechCrunch (@TechCrunch)
7:02 PM • May 14, 2024

That's a wrap on our newsletter! Here’s a quick recap before you go:

Generative AI Project Bootcamp: Accelerate your processes and prototype AI business ideas with your own automated AI project.
Startup and SME offers: If your team has 4 or more members, contact us to receive a group offer on our AI courses.
Customized Corporate Training: Equip your team with tailored sessions designed for companies diving into AI.

Catch you next week! 👋