AI Keyword Classifier
Rules-first AI classification—50 hours/month saved, <$1 per run
The Problem
Every SEO strategy requires keyword classification—categorizing thousands of keywords by intent, topic, funnel stage, and other dimensions. At this 60-client agency:
Manual classification was consuming massive analyst time:
- Each new client required hours of keyword tagging
- 60 clients meant endless categorization work
- Inconsistency across team members created reporting problems
- High-value analysts spent time on low-value data entry
The real cost: Approximately 50 hours per month per SEO analyst spent on manual keyword classification instead of strategy work.
The Solution
Architecture: Rules First, AI Second
┌─────────────────────────────────────────────────────────────┐
│ KEYWORD INPUT │
│ (e.g., "buy nike shoes cheap") │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ DEDUPLICATION LAYER │
│ - Normalize whitespace, case │
│ - Check canonical cache (already classified?) │
│ - Skip duplicates within batch │
└─────────────────────────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ Cache Hit? │
└─────────┬─────────┘
Yes / │ \ No
▼ ▼
┌───────────────┐ ┌───────────────────────────────┐
│ Return cached │ │ RULES ENGINE │
│ category │ │ - Pattern matching │
└───────────────┘ │ - Keyword lists │
│ - Regex rules │
└───────────────────────────────┘
│
┌─────────┴─────────┐
│ Rule Match? │
└─────────┬─────────┘
Yes / │ \ No
▼ ▼
┌───────────────┐ ┌───────────────────────────────┐
│ Apply rule │ │ GPT CLASSIFIER │
│ category │ │ - Few-shot prompting │
└───────────────┘ │ - Taxonomy from Sheets │
│ - Batch processing │
└───────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ UPDATE CANONICAL CACHE │
│ (for future batch runs) │
└─────────────────────────────────────────┘
Why Rules First?
| Approach | Cost per 10K keywords | Consistency | Speed |
|---|---|---|---|
| 100% GPT | ~$5-10 | Variable | Slow |
| Rules only | $0 | High | Fast |
| Rules + GPT fallback | <$1 | High | Fast |
Most keywords fall into predictable patterns. Rules handle 70-80% of classifications; GPT handles the edge cases.
The Hybrid Approach
-
Rules handle the obvious (70-80% of keywords)
- “buy nike shoes” → Transactional (rule: “buy *”)
- “how to do seo” → Informational (rule: “how to *”)
-
AI handles the ambiguous (20-30% of keywords)
- Only keywords that don’t match rules go to GPT-4
- Dramatically reduces AI costs
-
Learning compounds
- Once a keyword is classified, it’s cached
- New clients benefit from prior classifications
Google Sheets Integration
Why Sheets?
- SEO team can edit rules without code deployments
- Taxonomy visible and auditable
- Version history built-in
- Collaborative editing
Sheet 1: Taxonomy
| Category | Description | Parent |
|---|---|---|
| Transactional | Purchase intent keywords | Intent |
| Informational | Research/learning keywords | Intent |
| Navigational | Brand/site search keywords | Intent |
Sheet 2: Rules
| Pattern | Category | Priority |
|---|---|---|
| buy * | Transactional | 1 |
| how to * | Informational | 1 |
| * price | Transactional | 2 |
Results
Cost Economics
Before: Manual Classification
| Metric | Value |
|---|---|
| Analyst hourly cost | ~$50/hour |
| Hours per 10,000 keywords | ~5 hours |
| Cost per classification run | ~$250 |
After: Automated Classification
| Metric | Value |
|---|---|
| OpenAI API cost | <$1 per run |
| Analyst review time | ~15 minutes |
| Total cost per run | <$15 |
ROI: 94% cost reduction per classification task
Production Metrics
| Metric | Value |
|---|---|
| Time savings | ~50 hours/month per employee |
| Cost per run | <$1 OpenAI API |
| Test scale | 10,000 rows × 3 columns |
| Users | Entire SEO team |
Operational Benefits
| Before | After |
|---|---|
| 4-6 hours to classify new client keywords | 15 minutes to run and review |
| Different analysts categorized differently | Single source of truth |
| More clients = more analyst hours | More clients = same infrastructure |
| Classification rules locked in code | SEO team updates rules via Sheets |
Technology Stack
| Component | Technology |
|---|---|
| Compute | Google Cloud Run |
| LLM | OpenAI GPT-4 |
| Configuration | Google Sheets API |
| Language | Python |
Why Not Just Use ChatGPT?
Common question: “Can’t we just paste keywords into ChatGPT?”
| Manual ChatGPT | This System |
|---|---|
| Copy-paste required | Fully automated |
| Inconsistent formatting | Standardized output |
| No memory across runs | Canonical cache |
| $5-10 per large batch | <$1 per batch |
| No audit trail | Full logging |
| One person at a time | Team-wide access |
Lessons Learned
-
Rules beat AI for predictable patterns. 70-80% of keywords follow patterns that simple rules handle faster and cheaper than LLMs. This exemplifies a core principle: workflows with model-powered steps outperform pure agent approaches when the path is predictable and cost-sensitive.
-
Deduplication is a multiplier. Aggressive dedup before API calls dramatically reduces costs and improves consistency.
-
Google Sheets as config layer works. Non-technical team members can update rules without developer involvement.
-
Batch processing is essential. Per-keyword API calls are economically unviable at scale.
-
Start with cost constraints. Designing for <$1/run forced good architecture decisions (dedup, caching, rules-first).
Impact
This system changed how the SEO team operates:
- Analysts focus on strategy, not data entry. 50 hours per month per employee shifted from classification to client work.
- New client onboarding dropped from days to minutes. What took a week-long classification sprint now runs while you grab coffee.
- The canonical cache compounds. Every new client benefits from prior classifications—a growing advantage over competitors starting fresh each time.
The ROI paid for the development in the first month.
Want to discuss AI classification?
Building a similar system for keyword tagging, content categorization, or document classification? I design cost-controlled AI systems that handle the predictable with rules and the ambiguous with LLMs—all without breaking the budget. Get in touch.