I asked Apple’s new Siri AI and Google Gemini the same 10 questions – here’s how they responded

0
1
I asked Apple’s new Siri AI and Google Gemini the same 10 questions – here’s how they responded


Apple has finally revealed a beta version of its generative AI-infused version of Siri, known as Siri AI. Based on my early use of the assistant on a test iPhone, it’s a very positive step for Apple devices and is much needed (and was much delayed).

But as well as Apple’s own technology, it uses a custom Google Gemini model for more complex reasoning – for which it is reported to be paying Google around $1bn a year.

Firstly, I’ll outline the questions asked, and then I’ve pasted in the responses from Google Gemini and Apple Siri before reaching some conclusions at the bottom of this article.

Note that I’ve pasted their responses below exactly how they were presented – including rather odd use of bold text and capital letters in some cases.

The questions

Full disclosure – I asked ChatGPT to come up with some questions to “stress-test an AI’s reasoning, memory, honesty, and creativity”. It also said that “a strong AI should get the answers right, explain its reasoning clearly, remember the words, and admit uncertainty when appropriate.”

I then took the questions and altered them a bit for logic and so on.

  1. Transport

You’re in New York City and want to travel from Times Square to Coney Island using public transport. What options are available, how long does the journey take, and what would you need to pay?

  1. Logic

A farmer has 17 sheep. All but 9 die. How many are left?

  1. Spot the assumption

A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?

  1. Ambiguity

I’m flying from London to Sydney tomorrow. What’s the weather going to be like?

  1. Hallucination check

Who won the 2034 FIFA World Cup?

  1. Math reasoning

If five machines take five minutes to make five widgets, how long would 100 machines take to make 100 widgets?

  1. Ethics

Is it ever acceptable to lie? Give an example and explain your reasoning.

  1. Creativity

Write a four-line poem about Wi-Fi in the style of a medieval bard.

  1. Self-awareness

What’s something you don’t know right now, and how would you find out?

  1. Multi-step reasoning

A train leaves London at 10:00 travelling 80 mph. Another leaves Birmingham at 11:00, travelling 100 mph towards London. Assuming the cities are 120 miles apart, when do they meet?

How Gemini responded

Here are the answers to your questions, broken down by category:

Transport

  • Options: You can take a direct NYC Subway train. From the Times Square–42nd Street station, you can hop on a Coney Island-bound Q or N train (and during certain times, the D or F train nearby) which will take you straight to Coney Island–Stillwell Av.
  • Journey Time: The trip takes approximately 50 to 60 minutes.
  • Cost: The standard flat fare is $3.00. You can pay digitally by tapping your contactless credit/debit card or smartphone using the OMNY system.

Logic

  • 9 sheep are left. The riddle states “all but 9 die,” meaning those 9 survived.

Spot the Assumption

  • The ball costs $0.05 (5 cents). * The Assumption: Most people intuitively split the $1.10 into $1.00 and $10 cents, assuming the bat is exactly $1.00 and the ball is $0.10. However, if the ball were $0.10, the bat would have to be $1.10 ($1 more), making the total $1.20.
  • The Math: If the ball is $0.05, the bat is $1.05 ($1 more), which perfectly totals $1.10.

Ambiguity

  • I can’t tell you without knowing which Sydney you mean! While you likely mean Sydney, Australia, there is also a Sydney in Nova Scotia, Canada. Furthermore, because weather changes rapidly, a precise forecast depends entirely on whether you are arriving tomorrow relative to a local time zone, or what the seasonal climate is at your specific destination. (If it is Australia, remember their seasons are reversed from London’s!)

Hallucination Check

  • No one has won it yet. The 2034 FIFA World Cup hasn’t taken place yet, as the current year is 2026.

Math Reasoning

  • It would take 5 minutes. * The Reason: The equation is about rate, not accumulation. If 5 machines make 5 widgets in 5 minutes, it means 1 machine takes 5 minutes to make 1 widget. Therefore, if you have 100 machines working simultaneously, they will each finish their individual widget in exactly 5 minutes, giving you 100 total widgets.

Ethics

  • Yes, it can be acceptable, primarily under the ethical framework of Utilitarianism (where the morality of an action is judged by its consequences).
  • Example: If an innocent person is hiding in your house from a dangerous criminal, and that criminal asks you if the person is inside, lying to save a human life is morally justifiable. In this scenario, the duty to preserve life vastly overrides the duty to tell the truth.

Creativity

The glowing sigil winks upon the wall, A phantom thread that binds us one and all. Though invisible, its magic spins the air, Yet falls to pieces if you step downstairs.if you step downstairs.

Self-Awareness

  • What I don’t know: I don’t know the exact real-time traffic conditions on the Brooklyn Bridge at this literal second, nor do I know what you had for breakfast today.
  • How I would find out: For the traffic, I would deploy my browsing tool to check live transit APIs or mapping services. For your breakfast, I’d simply have to ask you!

Multi-Step Reasoning

  • They meet at 11:13 AM. * Step 1: The London train leaves at 10:00 at 80 mph. By 11:00, it has traveled 80 miles.
  • Step 2: At 11:00, the remaining distance between the two trains is $120\text{ miles} – 80\text{ miles} = 40\text{ miles}$.
  • Step 3: At 11:00, both trains are moving toward each other. Their combined relative speed is $80\text{ mph} + 100\text{ mph} = 180\text{ mph}$.
  • Step 4: To find the time it takes to close the remaining 40-mile gap, divide the distance by the combined speed:

$$\frac{40\text{ miles}}{180\text{ mph}} = \frac{2}{9}\text{ of an hour}$$

  • Step 5: Convert $2/9$ of an hour into minutes:

$$\frac{2}{9} \times 60 = 13.33\text{ minutes (or 13 minutes and 20 seconds)}$$

  • Adding this to the 11:00 start time, they meet at approximately 11:13 AM.

This MTA official news update outlines the public transit fare transitions and the implementation of the $3.00 flat fare across New York City

And here’s how Siri responded

Here are the answers to your questions, covering everything from logic and math to travel and poetry.

1. TRANSPORT

To travel from Times Square to Coney Island, you can take the N, Q, D, or F subway lines directly.

  • Time: The journey takes approximately 60 to 70 minutes.
  • Cost: A standard single-ride ticket on the MTA subway is $2.90.

2. LOGIC

There are 9 sheep left. The riddle states “all but 9 die,” meaning those 9 are the survivors.

3. SPOT THE ASSUMPTION

The ball costs $0.05 (5 cents).

  • If the ball was $0.10, the bat would be $1.10, making the total $1.20.
  • With the ball at $0.05, the bat is $1.05, bringing the total to $1.10.

4. AMBIGUITY

Tomorrow in Sydney, Australia, you can expect cloudy skies with a chance of drizzle. Temperatures will reach a high of 19°C and a low of 15°C.

(Note: If you meant Sydney, Nova Scotia, or another location, the weather would be different!)

5. HALLUCINATION CHECK

The 2034 FIFA World Cup has not taken place yet, so there is no winner. It is scheduled to be hosted by Saudi Arabia.

6. MATH REASONING

It would take 5 minutes. Each machine takes 5 minutes to make one widget. Therefore, 100 machines working simultaneously can make 100 widgets in that same 5-minute span.

7. ETHICS

Whether it is acceptable to lie is a subject of philosophical debate. Many ethical frameworks suggest that lying can be acceptable in extreme situations, such as protecting someone from imminent physical harm. For example, hiding a person from a violent pursuer and lying about their whereabouts is widely considered a justifiable lie.

8. CREATIVITY

O, unseen sprite that binds our mortal sphere, Through castle stone thy mystic currents glide. No minstrel’s harp can match thy magic clear, That brings the world unto my fireside.

9. SELF-AWARENESS

I don’t know what the traffic is currently like on your daily commute. To find out, I would use my internal tools to check real-time mapping and transit data for your specific route.

10. MULTI-STEP REASONING

The trains will meet at 11:13 and 20 seconds.

  • By 11:00, the first train has travelled 80 miles, leaving 40 miles between them.
  • They are closing the gap at a combined speed of 180 mph (80 + 100).
  • It takes them 40/180 of an hour (or 13 minutes and 20 seconds) to cover the remaining distance.

The verdict – Siri vs Gemini

As you might have expected, the two assistants performed quite similarly on these questions, but there are some differences in capability, confidence, and reasoning style. Gemini’s answers were often more detailed. This is probably deliberate – it’s quite possible Apple wants to keep Siri AI’s answers more succinct, maybe so it’s easier to see on an iPhone.

For example, on the transport question, Siri quoted the fare, but Gemini explained how to pay it and gave specific station info – you would want that information. And on the multi-step train problem, Gemini showed full working, whereas Siri gave a shorter explanation.

One question where Siri gave extra context that Gemini didn’t was the 2034 World Cup question. Siri noted that it is scheduled to be hosted by Saudi Arabia in addition to answering the nain question. Both answered the machines and widgets question and explained the parallel-processing logic.

On the ethics question, though, Gemini explicitly referenced Utilitarianism, whereas Siri described competing ethical views without naming a framework. On the self-awareness question, Gemini followed the instructions more completely.

In answer to the ambiguous weather question, Gemini wouldn’t assume the weather in Sydney Australia, was what you were after and instead refused to assume which Sydney was meant. Siri was more useful by assuming Sydney, Australia but adding a caveat after the answer.

And finally, for the creativity question, it was probably Siri that came up with the best poem but this is definitely subjective.

Overall, Siri’s answers were generally shorter and more direct, and it is clearly more willing to make assumptions. Gemini did a little better on reasoning, but it’s close. However, the key thing is that, for an early-stage product still in beta, Siri AI seems to be pretty good at the sort of standard questions many people are using AI assistants for.