Google has unveiled Gemini, which it claims is its largest and most powerful AI model to date. Gemini’s capabilities surpass that of ChatGPT and other competing AI models, according to Google.
Gemini is a multimodal AI that is able to incorporate information from different sources. This capability means that it can “generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video” according to Google DeepMind founder and CEO Demis Hassabis.
Gemini will be available to everyone according to Google. The first version, which Google calls Gemini 1.0, will ship in three different versions:
- Gemini Ultra — the “most capable” model for “highly complex tasks”.
- Gemini Pro — the “best model” for scaling across a wide range of tasks”.
- Gemini Nano — for tasks on devices.
Gemini is designed to work on all device types, from smartphones and PCs to data centers.
Gemini is superior, says Google
Google claims that Gemini’s performance surpasses that of its main competitor, GPT-4. Gemini Ultra managed to beat GPT-4 in 30 of 32 “widely-used academic benchmarks” according to the company.
The differences are marginal in many of the listed benchmarks. Gemini scored 53.2% in the MATH benchmark, which is 0.3% better than GPT-4. The only benchmark with a noticeable lead is HumanEval python code generation. Gemini scored 74.4% in the benchmark, GPT-4 only 67.0% according to Google.
Gemini breaks with the status quo of multimodal models. Up until now, separate components were trained and then merged together. Google says that it has trained Gemini to be natively multimodal by pre-training it from the start on different modalities.
This improves the AI’s understanding “far better than existing multimodal models” according to Hassabis.
Google trained Gemini to “recognize and understand text, images, audio and more at the same time”. This makes it better suited for understanding information and providing answers to questions, even complex ones.
Gemini is also well-equipped when it comes to coding. It “can understand, explain and generate high-quality code in the world’s most popular programming languages” according to Google.
Google promises that Gemini is “built with responsibility and safety at the core”.
The company plans to integrate Gemini Pro in Google products. Google Bard will use a “fine-tuned version of Gemini Pro” starting today. The upgrade is available in English only at the time, but in more than 170 countries and territories according to Google.
Gemini Nano is coming to the Google Pixel 8 Pro. It will feature new tools, like Summarizer in the Recorder ap, Smart Reply in Gboard and more.
Gemini will be integrated into other Google products, including the Chrome web browser, Search, advertisement and Duet AI.
Developers and Enterprise customers may gain access to Gemini Pro via the Gemini API in Google AI Studio from December 13 onward.
Closing Words
After the rather disappointing launch of Google Bard, Google hopes that Gemini will persuade the public, developers and Enterprise customers that it is a primary leader in AI. It remains to be seen how well the different types of Gemini do once they become available.