Gemini breaks new ground with a faster model, longer context, AI agents and more

The Gemini family of models is getting improvements, including the 1.5 Flash, our lightweight model for speed and efficiency, and Project Astra, our AI assistant vision.

Gemini 1.0, our first natively multimodal model, debuted in December in Ultra, Pro, and Nano sizes. Just months later, we introduced 1.5 Pro with improved performance and a groundbreaking 1 million token context window.

Developers and enterprise users are using 1.5 Pro’s extended context window, multimodal reasoning, and great performance in amazing ways.

User feedback shows that some apps need lower latency and cost. This spurred us to keep inventing, so today we’re unveiling Gemini 1.5 Flash, a lighter, faster, and more efficient 1.5 Pro for scale.

In Google AI Studio and Vertex AI, 1.5 Pro and 1.5 Flash are in public preview with 1 million token context windows. 1.5 Pro is now accessible via waitlist to API developers and Google Cloud customers with a 2 million token context window.

The Gemini family of models is getting improvements, Gemma 2, our next generation of open models, is coming, and Project Astra is advancing AI helpers.

Gemini family of model updates

The new 1.5 Flash, optimized for speed and efficiency

Google Reveals 'Gemini', Its Biggest AI Model

1.5 Flash is the quickest API Gemini model and the newest. It’s tailored for high-volume, high-frequency jobs at scale, cheaper to service, and has our revolutionary lengthy context window.

It’s lighter than 1.5 Pro but can multimodally reason over massive quantities of data and performs well for its size.

1.5 Flash is great at summarizing, conversation, picture and video captioning, data extraction from large documents and tables, and more. 1.5 Pro trained it using “distillation,” which transfers the most important knowledge and skills from a larger model to a smaller, more efficient model.

Significantly improving 1.5 Pro

1.5 Pro, our best model for general performance across a wide range of jobs, has been greatly enhanced in recent months.

In addition to expanding its context window to 2 million tokens, we improved code generation, logical reasoning and planning, multi-turn discussion, and audio and visual recognition with data and algorithms. Public and internal benchmarks for each task show significant progress.

1.5 Pro may now obey more complicated and nuanced directions, including product-level role, format, and style behavior. We increased model response control for certain use scenarios, such as chat agent personas and answer styles or workflow automation through numerous function calls. We let users set system commands to control model behavior.

google gemini: Google's Gemini: is the new AI model really better than  ChatGPT? - The Economic Times

We added audio understanding to the Gemini API and Google AI Studio, so 1.5 Pro can reason across picture and audio for Google AI Studio videos. We’re adding 1.5 Pro to Google products like Gemini Advanced and Workspace.

Gemini Nano understands multimodal inputs

Gemini Nano accepts graphics in addition to text. Starting with Pixel, Gemini Nano with Multimodality apps will perceive the world like people—through text, sight, sound, and spoken language.

Next generation of open models

Gemma, a family of open models based on the Gemini models’ research and technology, received several updates today.

Gemma 2 is our next open model for responsible AI innovation. Gemma 2 will come in new sizes with a new architecture for breakthrough performance and efficiency.

PaliGemma, our first PaLI-3-inspired vision-language model, joins Gemma. LLM Comparator for model response quality evaluation has been added to our Responsible Generative AI Toolkit.

Progress developing universal AI agents

Google DeepMind’s aim is to responsibly construct AI to benefit humanity, and we’ve always aspired to create universal AI agents for everyday use. With Project Astra, a sophisticated seeing and talking responsive agent, we’re building the future of AI assistants.

An agent must understand and respond to the complex and dynamic world like people do, taking in and remembering what it sees and hears to interpret context and act. To allow people to talk to it organically and quickly, it must be proactive, teachable, and personable.

While we’ve made great strides in designing AI systems that can grasp multimodal input, reducing response time to conversational levels is difficult. We’ve spent years improving our models’ perception, reasoning, and communication to make interaction feel more natural.

A two-part demo of Project Astra, our AI assistant vision. Each component was recorded live in one take.

We created prototype agents based on Gemini that continually encode video frames, combine video and speech input into a timeline, and cache this knowledge for fast recall.

We improved the agents’ intonations by using our top speech models. These agents can comprehend conversation context and respond swiftly.

Gemini Nano understands multimodal inputs

This technology makes it easy to imagine a future when people have an expert AI helper on their phone or glasses. These features will be added to Google products like the Gemini app and online experience later this year.

Continued exploration

We’ve made great strides with our Gemini models, and we’re continually pushing the envelope. We may explore new frontier concepts and unlock intriguing Gemini application cases by investing in a relentless innovation manufacturing line.

Learn more about Gemini and its capabilities.

Leave a Comment