Apple

How much Gemini is really inside Siri AI?

June 11, 2026

Summary created by Smart Answers AI

In summary:

Macworld’s analysis reveals that Apple’s Siri AI uses Google Gemini’s foundation models as a base but retrains them with proprietary Apple data and optimizes for Apple Silicon.
Apple deploys five third-generation AI models: two on-device models for simple tasks and three cloud-based models for complex processing, all secured through Private Cloud Compute.
Despite using Gemini foundations, Siri AI offers a distinct experience from Google’s implementation, with Apple maintaining full control over data security and processing.

Apple this week announced a dramatically improved version of Siri, aptly named Siri AI. But instead of accolades, among the Apple enthusiasts in places such as X and Reddit, it’s already been decided: Siri AI is just a slightly older version of Google’s Gemini, with its own interface and voice.

You’d be forgiven for believing this. After months of rumors that Apple was turning to Google’s Gemini technology to bring Siri up to speed, and a purposefully vague joint statement to that effect this January, it certainly seemed like the new Siri would be exactly that.

But then the big WWDC keynote came and went, and Gemini was barely mentioned at all. Following the keynote, Apple held a private “technical deep dive” for journalists after the event (which was not officially recorded and streamed), where Craig Federighi and three Apple VPs in charge of Siri and AI explained Siri’s relationship with Google in greater detail. As it seems is always the case with AI, the truth is complicated, and every company involved is using very precise and opaque language that is more about what they don’t say than what they do say.

But there’s a lot of information out there that can help us get a clearer picture of what Apple’s new Siri AI actually is, how it works, and how Google’s Gemini is involved.

Apple’s new Foundation Models

Let’s start at the bottom. Apple used the term “Foundation Model” a lot during WWDC. In a nutshell, it’s a big AI model that is trained on a huge amount of data that is then used in whole or in part to deliver specific AI experiences in apps. They can be language models, vision models, image generation models, or audio processing models, though modern foundation models are what are called multi-modal, which means they understand and produce results among all these things together.

Most companies scale their big foundation models to different sizes. The most advanced version of the model is so big it can only fit in and run well on huge AI servers with hundreds of gigabytes of RAM and massive, expensive, high-powered processors. So companies produce smaller versions with fewer “parameters” that can run on smaller servers, desktop computers, and laptops, and even little models that run directly on a smartphone.

Apple

Apple has five new third-generation Foundation Models, as explained in a post on Apple’s Machine Learning research site. The first two are the small models made to run directly on device:

AFM 3 Core: The next generation of our 3-billion-parameter dense model that delivers a step up in quality.
AFM 3 Core Advanced: Apple’s most powerful on-device model. It’s natively multimodal, enabling helpful features like expressive voices and higher-accuracy dictation. Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request. This model only runs on the latest Apple devices.

Those two are made to run directly on-device for all supported hardware. The AFM 3 Core Advanced model requires an iPhone 17 Pro or iPhone Air, Macs with an M3 and at least 12GB of RAM, or iPads with M4. You’ll notice that Apple says it has a “sparse architecture,” which means that it is broken up into chunks that specialize in different areas, and only the pieces that are needed are loaded up when you make a request. For example, a piece devoted to math wouldn’t be loaded if you ask how tall the Burj Khalifa is, but would be when you follow up to ask how many Burj Khalifas fit between the Earth and the moon.

The on-device models are joined by three new cloud-based models:

AFM 3 Cloud: Apple’s server-side model, optimized for speed, efficiency, and performance.
ADM 3 Cloud (Image): Devoted to image generation and editing, which unlocks advanced photo-editing tools, the all-new Image Playground, and more.
AFM 3 Cloud Pro: Apple’s most capable server-based model, which powers our most demanding use cases, including agentic tool use and complex reasoning.

AFM 3 Cloud is the big server model that handles most things, but for the really complicated requests, there’s an AFM 3 Cloud Pro. They are joined by a special image-centric model that is used for Image Playground (and all the apps that call on the Image Playground framework), genmoji, and all the new AI image editing tools: Clean Up, Extend, and Reframe.

Apple

Apple is using its own servers (mostly)

The first important point is that the first four models—the on-device models and the first two cloud models—run on Apple Silicon. The cloud models use Apple’s Private Cloud Compute architecture that makes the code open for researchers to ensure that the only data sent to the cloud is necessary to complete the request. After the query, the data is deleted and never retained.

The biggest cloud model, AFM 3 Cloud Pro, requires more muscle than the current Apple Silicon-based servers can provide. It is built to run on Google’s cloud infrastructure with Nvidia GPUs, but this is not off-the-shelf server leasing. Apple is running its Private Cloud Compute infrastructure here, too. All the core PCC requirements are met: stateless computation, no privileged runtime access, non-targetability, and verifiable transparency.

You can read more about how Apple is extending Private Cloud Compute to Google’s servers with Nvidia hardware on Apple’s Security Research site.

Apple

How does Siri AI even work?

When you make a request to Siri, it first gets interpreted, either by typing or through a voice recognition model. Then, a component called the System Orchestrator turns what you said into a sort of underlying invisible prompt and decides which model or models it should go to.

If you’re asking Siri to turn on a light at home, start a timer, or tell you the weather, the on-device model handles that. But if you want to generate a few paragraphs of text, the system orchestrator will send the prompt to the Private Cloud compute cluster for processing. It will also send the appropriate data necessary to fulfill that request.

Foundry

For example, if you’re writing an email with a menu of items guests are bringing to a potluck, the system orchestrator might first pull relevant text messages from the search index. Perhaps it could include a screenshot of what’s on your iPhone’s screen if it includes relevant info. After the text is generated and sent back down to your device, the request and any associated data are deleted. All of this happens with as much encryption and pseudonymity as possible, so nobody at Apple or Google can access your requests, data, or results.

This is one reason why some of the new AI image processing tools seemed slow in the iOS 27 demos, because images and data need to be uploaded and processed in the cloud. Turn on Airplane mode and disconnect from Wi-Fi, and you can’t use the new AI image tools at all.

Where does Gemini come in?

In the post-keynote discussion at WWDC, Federighi explained why Siri AI is not Gemini:

Of course, we don’t have the Gemini app as our app. In fact, none of that client code is part of how we run on iOS. For these models, we use none of the models that Google deploys to their customers, nor do we use the infrastructure and means by which they deploy models to their customers. And then, when it comes to the knowledge base, we of course don’t use Google Search or anything like that as the foundation of our system. So I hope that’s clear. The amount of the Google Assistant we use is none.

Read Craig’s words carefully, and you’ll notice he’s specifically saying that the client experience (the app and assistant) is not Gemini, nor are the specific servers the same ones Google uses to serve Gemini to its customers. Furthermore, Siri AI doesn’t pull info from Google’s web search or knowledge graph; it uses its own.

However, Federighi is not claiming that Apple’s models themselves are not based on Gemini code. In fact, he explicitly says the four models made to run on Apple Silicon are “trained using proprietary data with reinforcement learning and refined using outputs from Gemini frontier models.” It’s likely that the biggest model is trained using both Google and Apple’s proprietary data, or has some other distinguishing characteristic other than its size that made him leave it out of that statement.

Apple

So what does that mean? It seems like Apple started with Gemini’s foundation models, optimized and rebuilt them for Apple Silicon and the model sizes it needs, and retrained them with its own data, weights, and guardrails. As a user, you shouldn’t expect the same performance, capabilities, and results from Siri AI on your iPhone as you would get from Google’s Gemini on a Pixel phone.

An analogy I like to use: Apple used Unix (technically, the Unix-derivative called Darwin) as the core for every operating system going back to Mac OS X. But that doesn’t mean Apple’s OSes share the same compatibility, features, or characteristics as Unix. Nor does it mean Apple lacks the world-class operating system engineers necessary to make a great OS. Unix is merely a foundation to start on, and a quicker way to get a leg up on development. In much the same way as it did back in 1999 and 2000 when building Mac OS X (and then later iPhone OS and so on), Apple used someone else’s work to get started and then built its own thing that’s indistinguishable from where it began.