I Built My Own Topical Authority Map Tool: Here’s What Happened

Comparison between traditional SEO crawl visualization and semantic topical authority clustering, showing chaotic link-based structure versus organized meaning-based clusters

I do a lot of topical authority audits, and the topical authority map I needed at the end of every one of them never came out of a single tool. I assembled it by hand every time, and that part of the work always felt fragmented in a way nobody seems willing to admit out loud.

The standard workflow looked like this. Pull a crawl from Screaming Frog. Export keyword data from Ahrefs. Cross-reference against Semrush’s content gap report. Stitch all of it together in a spreadsheet, run patterns through Claude, then walk a strategist through what they’re looking at.

Every audit took most of a day, sometimes longer. The data was there. The meaning wasn’t, and I was the meaning. That doesn’t scale.

I kept thinking the same thing. Why am I doing this manually when the pieces all exist? The connections between them are the actual hard part, and that’s exactly what AI is supposed to handle. So I opened my terminal and started building the tool the way I would’ve wanted one to work.

Before any of that I asked a question I keep asking lately. Should this actually be a tool, or am I about to build something pointless? The framework I use to decide if AI is the right call said yes. So I started.

Why semantic clustering matters

The technical gap most topical authority tools miss is that they cluster by keyword string match instead of by meaning.

Google doesn’t work that way. Semantic search has been the standard for years, and the systems on top of it have only gotten better. Two pages can target completely different keywords and still compete because the underlying meaning overlaps. A page targeting “content strategy for SaaS” and a page targeting “how to plan your SaaS blog” land on different SERPs, sometimes with different rankings, but they’re answering the same intent. Google sees that. Most SEO tools don’t.

Infographic explaining how a topical authority map works, from crawled pages and keyword data through semantic clustering to insights like topic clusters, cannibalization, and content gaps, with subtle GM branding

That’s where the hidden cannibalization lives. The kind of cannibalization a keyword-only audit doesn’t catch. Two pages quietly competing for the same intent, neither ranking as well as a consolidated version would, and neither flagged because the strings don’t overlap.

A real topical map seo workflow has to start from meaning, not from keyword match. That’s the part I wanted my tool to fix.

What I built and how it works

It’s a topical authority mapper that runs on local embeddings. Each page becomes a numerical fingerprint of its meaning, and pages with similar meanings sit close to each other in vector space. That’s the entire premise. Page Optimizer Pro has one of the cleaner writeups I’ve seen on the difference between keyword clustering and semantic clustering if you want the longer version of why this matters.

topical authority mapper summary section

What it does: it groups pages by meaning rather than keyword match. Two pages that target different terms but answer the same intent end up in the same semantic cluster.

Some of this isn’t entirely new ground. Screaming Frog itself published a tutorial on spotting semantically similar pages using embeddings, which works as a one-off check. The thing I wanted was an integrated workflow that goes further than spotting outliers: clusters, gaps, and cannibalization in a single pass instead of three separate exports.

vector mapping topical authority mapper

So my tool surfaces four things that no off-the-shelf tool flags together:

  1. Semantic clusters: what topics the site actually owns, based on what the content means rather than what’s in the H1
  2. Cannibalization signals: pages semantically too close to each other, the kind that would lose to a consolidated version
  3. Competitor coverage gaps: topics competitors cover that the site doesn’t, scored by semantic relevance to the existing content rather than just keyword volume
  4. Brand voice consistency (experimental): flags pages that drift from the site’s dominant tone

Brand voice scoring is rough, and that’s why I’m marking it experimental. The first three are where I’d stake my work.

Current state: functional, not pretty. Output is a CSV and a JSON dump. No dashboard yet. A strategist can use it but they need to know what they’re looking at.

search intent distribution dashboard

The honest build story

The first version barely worked. Clustering was too sensitive. It grouped almost every page on a 200-page site into the same cluster because the default similarity threshold didn’t fit SEO content.

Second version overcorrected. Threshold too tight, every page in its own cluster, no useful groupings.

Third version was closer but still flagged healthy clusters as cannibalization risks. It told me two pillar pages on related topics were competing when they were doing exactly what they should be doing.

I’m calibrating the fourth version now. Some clusters still look noisy. I’m running it against sites I know well to gut-check the output before trusting it on anything client-adjacent. That part of the process matters more than the model choice.

I’m not the only practitioner doing this kind of thing. Metehan Yesilyurt has been documenting his own embedding-based SEO tool and his writeups helped me sanity-check some of my early calibration choices. There’s a small but growing group of SEOs treating this as a build problem rather than a procurement one.

The real insight, the one I keep coming back to, is that AI didn’t just help me build the tool faster. It changed what I thought was possible to build at all.

I’m not a developer by trade. I’m a growth practitioner who can read code. A few years ago, the gap between “I wish a tool did this” and “fine, I’ll build the tool” was wide enough that I’d never cross it. Now the gap is smaller than my willingness to spend a weekend on it. Same dynamic showed up when I built my website with Claude Code. When the input was right, the output was right. When the input was vague, I got vague output. The tool wasn’t the ceiling. My input was.

That shift in capability is the part of the AI conversation I haven’t seen enough of. Most posts are about productivity and doing the same things faster. The actual change is in what becomes buildable in the first place.

I’m proud of this tool even though it’s not finished. A little because of that.

What’s next

Three priorities, in order:

  1. Improve the vector database layer: persist embeddings between sessions so I stop re-processing the full site every run. Search Engine Journal has a good primer on how vector databases fit into SEO workflows if you’re new to the concept. This is the difference between “this works” and “this is a regular workflow I trust.”
  2. Cleaner exports: a strategist should open the output and act on it without having to interpret raw CSVs
  3. Tighter cannibalization scoring: reduce false positives before this goes anywhere near a client deliverable

I’ll share the working version publicly once those three are stable. Repo coming soon. If you run topical authority audits and want to test it on a site with messy structure, I’d actually like to hear from you. That’s where the real edge cases will surface.

AI is expanding what practitioners can build, not just what they can do. That’s the side of the conversation I haven’t heard enough of. Tool isn’t done. Building in public anyway.


FAQ

What is a topical authority map?
A topical authority map is a structured representation of every topic and subtopic a site covers, organized so search engines and AI systems can recognize the site as an expert in a specific area. Most maps are built keyword-first. The version I built starts from the actual meaning of each page using embeddings, which catches connections and overlaps that keyword tools miss.

How is semantic clustering different from keyword clustering?
Keyword clustering groups pages by string similarity in their target queries. Two pages with overlapping words get clustered together. Semantic clustering groups pages by meaning. Two pages can target completely different keywords and still cluster together if they’re answering the same intent. Google ranks based on meaning, not strings, so semantic is closer to how the SERPs actually work.

Why do most SEO tools miss hidden cannibalization?
Because they look for keyword overlap, not intent overlap. Two pages targeting different keywords can compete for the same SERP if they’re answering the same underlying question. Standard cannibalization audits flag matching strings. They miss the case where the strings are different but the meaning is identical, which is the cannibalization that actually costs rankings.

What are embeddings, and why do they matter for SEO audits?
Embeddings turn each page into a numerical vector that represents its meaning. Pages with similar meaning sit close together in vector space. For SEO, that means you can compare any two pages on your site by meaning instead of keywords, find genuine duplicates and near-duplicates, and spot topical gaps that wouldn’t show up in a keyword report.

Can you build your own SEO tool with AI?
Yes, if you can read code and treat the AI as a collaborator instead of a magic box. Claude Code wrote most of the implementation for me. The parts that needed real judgment (calibration thresholds, what to flag, what to ignore) were mine. The faster the feedback loop between you and the model, the better the output.

When should you build a custom SEO tool versus using an existing one?
Build when the integration between existing tools is the bottleneck, not the tools themselves. If three existing tools each do part of what you need but require manual stitching every time, that’s a workflow worth automating. If a single existing tool does the job well enough, buy it. Custom is for the connective tissue nobody else has built yet.

Find me on LinkedIn if you want to talk about it.


🤖Transparency Note This article was drafted with AI assistance and reviewed, edited, and fact-checked by a human.
Written by

Gera Mejia

Growth Marketer & AI Search Practitioner. I test tools, build agents, break workflows, and share what I learn.