
As AI tool usage has become more common, I’ve seen impressive examples of people building tools to automate complex processes that once required significant manual effort. I’ve also seen teams adopt AI simply because it’s available, often with little practical benefit.
My approach is to focus on AI applications that save time and solve real problems.
Recently, I needed to align the SEO architecture for more than a dozen websites across three separate businesses, eight regional domains, and multiple languages, including three English dialects, Italian, Japanese, Spanish, Thai, French, and Korean.
Historically, mapping thousands of URLs to create cohesive hreflang XML sitemaps would have required specialized software or days of spreadsheet work. Instead, I used Google Gemini to build a custom Python script that handled the heavy lifting.
Here’s how the project evolved from an initial prompt into a highly customized automation tool, and what it taught me about using AI for technical SEO.
Where AI delivers the most value
I use AI primarily for practical, time-saving tasks, including:
- Generating regex patterns when I need a quick solution without researching syntax from scratch.
- Creating complex spreadsheet formulas for reporting workflows that rely on manual data exports.
- Accelerating research and planning for projects that require competitive analysis across multiple business lines.
- Building custom automation tools for recurring SEO and data-processing tasks.
The hreflang project discussed here falls into that final category.
Track, grow, and measure your visibility across Google, AI search, social, local, and every channel that influences buying decisions.
Mapping hreflang at scale
The challenge was clear: map thousands of URLs across more than a dozen multilingual websites into accurate hreflang XML sitemaps.
Rather than tackling the project manually, I used Google Gemini to help build a custom Python solution.
Here’s how the process unfolded.
Phase 1: Asking for an approach, not just a script
A common pitfall when using generative AI for coding is asking it to sprint before it knows the route. If you simply type, “Write a Python script to create an hreflang sitemap,” you’ll get a generic, fragile piece of code that breaks the moment it encounters real-world data.
Instead, I started by asking for an approach. I explained the scenario: multiple regional domains, organic growth over several years resulting in mismatched URL slugs, translated subfolders, and appended revision years.
Gemini suggested a multi-step, data-driven approach:
- Crawl the websites to collect live URLs and their metadata.
- Use Python in Google Colab to process the raw data.
- Run an exact match cluster first to group identical slugs.
- Use an advanced semantic AI model (such as SentenceTransformers) to fuzzy match translated pages based on their titles and normalized URLs.
Phase 2: Crawling and data collection
Following the strategy, I used a crawler to spider all the regional websites. The goal was to generate a unified comma-separated values (CSV) file containing the live URLs, status codes, title tags, and H1s. Screaming Frog worked perfectly for this application.
A critical point: Your AI output is only as good as your crawl data (remember the old saying, “garbage in, garbage out”).
An AI script will fail to map an obvious “exact match” if the target URL is a 404 or a 301 redirect in your source data. You must filter your CSV to include only indexable content before feeding it to the script.
Dig deeper: International SEO in 2026: What still works, what no longer does, and why
Phase 3: The Google Colab sandbox
Google Colab provides a free, cloud-based Jupyter notebook environment where you can write, paste, and execute Python code without worrying about local installations or environment variables. You can access it through Google Drive. I found the free version had enough capacity to handle this project.
I uploaded the CSV to Colab, and Gemini provided the initial Python script. The script used a domain-mapping routine to assign language codes, clean the URLs, and generate an XML tree. The initial output was far from perfect.
Phase 4: The iteration (where the real work happens)
If you expect AI to deliver a flawless, edge-case-proof script on the first try, you’ll be disappointed. You’ve probably heard the comparison of AI tools to interns, meaning you need to check their work. That’s very true.
The real value of AI lies in the iteration. As we ran the script, we encountered several unmatched URLs, leaving pages orphaned rather than grouping them with their international counterparts.
Here’s how I iteratively trained the AI to handle the nuances of human-managed websites.
The directory flattening problem
The U.S. site had recently reorganized its blog into topical folders, while the Mexican and Italian sites hadn’t yet been reorganized.
I prompted Gemini with these specific mismatched examples. It responded by adding a URL flattener function to the script, which stripped the topical folders behind the scenes so the translated slugs could align cleanly.
The aggressive semantic trap
To prevent the AI from mixing up different topics, we implemented concept traps. Initially, they were too strict. A UK article about the manufacturing sector wouldn’t match an Italian article because the U.S. title was slightly more generic.
I instructed Gemini to loosen the traps for generic industries while keeping them strictly enforced for critical acronyms (such as “SEO” versus “SEM”). This gave the AI the breathing room it needed to match creative translations.
The translated slug epiphany
The biggest breakthrough came while auditing the Mexican blog orphans. For example, the Spanish URL /detras-de-escenas-historias... is a direct translation of the English /behind-the-scenes-stories... I pointed this out to Gemini.
Instead of forcing me to hard-code hundreds of manual matches, Gemini updated the script to create a “Combined Semantic Signature.” It dynamically translated core operational phrases in the slugs, effectively bridging the language gap for the semantic matching model and connecting dozens of orphaned pages almost instantly.
Dig deeper: Cultural SEO: A practical framework for Spanish markets in AI search
See where your brand appears, where it doesn’t, and exactly how to win more visibility across search, AI, local, social, and every channel that matters.
Lessons from building an AI-assisted SEO tool
The project reinforced a simple lesson: AI works best when it’s treated as a collaborator rather than a shortcut.
- Be the strategist, let AI be the coder: Don’t just demand a final product. Discuss the architecture, edge cases, and logic first. Treat AI like a junior developer that needs clear architectural direction.
- Provide concrete examples: When a script fails, don’t just say, “It’s broken.” For this project, I provided either exact URLs that failed and the URLs they should have matched with, or groups of URLs with mismatches. AI needs concrete patterns to fix its logic.
- Embrace the iterative loop: Expect to run the code, identify anomalies, and feed them back into the prompt. Each iteration makes the tool significantly smarter.
- Leverage Google Colab: You don’t need to be a Python expert to use Python for SEO. Colab bridges the technical gap, allowing you to run complex data science libraries directly in your browser.
By the end of the project, we had a robust, highly customized Python script that could process a massive CSV and generate a cross-referenced hreflang XML sitemap in minutes.
AI isn’t going to replace technical SEOs anytime soon. However, SEOs who know how to collaborate with AI to build custom, scalable, and useful tools will have a significant advantage.
Dig deeper: How AI search defines market relevance beyond hreflang
