AI Keyword Classifier

The Problem

Every SEO strategy requires keyword classification—categorizing thousands of keywords by intent, topic, funnel stage, and other dimensions. At this 60-client agency:

Manual classification was consuming massive analyst time:

Each new client required hours of keyword tagging
60 clients meant endless categorization work
Inconsistency across team members created reporting problems
High-value analysts spent time on low-value data entry

The real cost: Approximately 50 hours per month per SEO analyst spent on manual keyword classification instead of strategy work.

The Solution

Architecture: Rules First, AI Second

┌─────────────────────────────────────────────────────────────┐
│                    KEYWORD INPUT                            │
│              (e.g., "buy nike shoes cheap")                 │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 DEDUPLICATION LAYER                         │
│  - Normalize whitespace, case                               │
│  - Check canonical cache (already classified?)              │
│  - Skip duplicates within batch                             │
└─────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │    Cache Hit?     │
                    └─────────┬─────────┘
                        Yes / │ \ No
                            ▼   ▼
            ┌───────────────┐   ┌───────────────────────────────┐
            │ Return cached │   │      RULES ENGINE             │
            │   category    │   │  - Pattern matching           │
            └───────────────┘   │  - Keyword lists              │
                                │  - Regex rules                │
                                └───────────────────────────────┘
                                              │
                                    ┌─────────┴─────────┐
                                    │   Rule Match?     │
                                    └─────────┬─────────┘
                                        Yes / │ \ No
                                            ▼   ▼
                            ┌───────────────┐   ┌───────────────────────────────┐
                            │ Apply rule    │   │      GPT CLASSIFIER           │
                            │  category     │   │  - Few-shot prompting         │
                            └───────────────┘   │  - Taxonomy from Sheets       │
                                                │  - Batch processing           │
                                                └───────────────────────────────┘
                                                              │
                                                              ▼
                                    ┌─────────────────────────────────────────┐
                                    │          UPDATE CANONICAL CACHE         │
                                    │         (for future batch runs)         │
                                    └─────────────────────────────────────────┘

Why Rules First?

Approach	Cost per 10K keywords	Consistency	Speed
100% GPT	~$5-10	Variable	Slow
Rules only	$0	High	Fast
Rules + GPT fallback	<$1	High	Fast

Most keywords fall into predictable patterns. Rules handle 70-80% of classifications; GPT handles the edge cases.

The Hybrid Approach

Rules handle the obvious (70-80% of keywords)
- “buy nike shoes” → Transactional (rule: “buy *”)
- “how to do seo” → Informational (rule: “how to *”)
AI handles the ambiguous (20-30% of keywords)
- Only keywords that don’t match rules go to GPT-4
- Dramatically reduces AI costs
Learning compounds
- Once a keyword is classified, it’s cached
- New clients benefit from prior classifications

Google Sheets Integration

Why Sheets?

SEO team can edit rules without code deployments
Taxonomy visible and auditable
Version history built-in
Collaborative editing

Sheet 1: Taxonomy

Category	Description	Parent
Transactional	Purchase intent keywords	Intent
Informational	Research/learning keywords	Intent
Navigational	Brand/site search keywords	Intent

Sheet 2: Rules

Pattern	Category	Priority
buy *	Transactional	1
how to *	Informational	1
* price	Transactional	2

Results

Cost Economics

Before: Manual Classification

Metric	Value
Analyst hourly cost	~$50/hour
Hours per 10,000 keywords	~5 hours
Cost per classification run	~$250

After: Automated Classification

Metric	Value
OpenAI API cost	<$1 per run
Analyst review time	~15 minutes
Total cost per run	<$15

ROI: 94% cost reduction per classification task

Production Metrics

Metric	Value
Time savings	~50 hours/month per employee
Cost per run	<$1 OpenAI API
Test scale	10,000 rows × 3 columns
Users	Entire SEO team

Operational Benefits

Before	After
4-6 hours to classify new client keywords	15 minutes to run and review
Different analysts categorized differently	Single source of truth
More clients = more analyst hours	More clients = same infrastructure
Classification rules locked in code	SEO team updates rules via Sheets

Technology Stack

Component	Technology
Compute	Google Cloud Run
LLM	OpenAI GPT-4
Configuration	Google Sheets API
Language	Python

Why Not Just Use ChatGPT?

Common question: “Can’t we just paste keywords into ChatGPT?”

Manual ChatGPT	This System
Copy-paste required	Fully automated
Inconsistent formatting	Standardized output
No memory across runs	Canonical cache
$5-10 per large batch	<$1 per batch
No audit trail	Full logging
One person at a time	Team-wide access

Lessons Learned

Rules beat AI for predictable patterns. 70-80% of keywords follow patterns that simple rules handle faster and cheaper than LLMs. This exemplifies a core principle: workflows with model-powered steps outperform pure agent approaches when the path is predictable and cost-sensitive.
Deduplication is a multiplier. Aggressive dedup before API calls dramatically reduces costs and improves consistency.
Google Sheets as config layer works. Non-technical team members can update rules without developer involvement.
Batch processing is essential. Per-keyword API calls are economically unviable at scale.
Start with cost constraints. Designing for <$1/run forced good architecture decisions (dedup, caching, rules-first).

Impact

This system changed how the SEO team operates:

Analysts focus on strategy, not data entry. 50 hours per month per employee shifted from classification to client work.
New client onboarding dropped from days to minutes. What took a week-long classification sprint now runs while you grab coffee.
The canonical cache compounds. Every new client benefits from prior classifications—a growing advantage over competitors starting fresh each time.

The ROI paid for the development in the first month.

Want to discuss AI classification?

Building a similar system for keyword tagging, content categorization, or document classification? I design cost-controlled AI systems that handle the predictable with rules and the ambiguous with LLMs—all without breaking the budget. Get in touch.

The Problem

The Solution

Architecture: Rules First, AI Second

Why Rules First?

The Hybrid Approach

Google Sheets Integration

Results

Cost Economics

Production Metrics

Operational Benefits

Technology Stack

Why Not Just Use ChatGPT?

Lessons Learned

Impact

Want to discuss AI classification?

Let's Build Something

Taking on new work.

Book a Call

Send a Message

The Problem

The Solution

Architecture: Rules First, AI Second

Why Rules First?

The Hybrid Approach

Google Sheets Integration

Results

Cost Economics

Production Metrics

Operational Benefits

Technology Stack

Why Not Just Use ChatGPT?

Lessons Learned

Impact

Want to discuss AI classification?

What Are Workflows and Agents

How I Learned to Build

What Are AI Agents, Actually?

Let's Build Something

Taking on new work.

Book a Call

Send a Message