AI Keyword Classifier

Rules-first AI classification—50 hours/month saved, <$1 per run

AISEOAutomationPython
Client Copenhagen-based SEO agency
Period September 2025 — Present
Role Independent Consultant
Key Impact:
50 hours/month saved per analyst | 94% cost reduction vs manual classification | <$1 OpenAI cost per 10,000 keywords
Google Cloud RunOpenAI GPT-4Google Sheets APIPython

The Problem

Every SEO strategy requires keyword classification—categorizing thousands of keywords by intent, topic, funnel stage, and other dimensions. At this 60-client agency:

Manual classification was consuming massive analyst time:

  • Each new client required hours of keyword tagging
  • 60 clients meant endless categorization work
  • Inconsistency across team members created reporting problems
  • High-value analysts spent time on low-value data entry

The real cost: Approximately 50 hours per month per SEO analyst spent on manual keyword classification instead of strategy work.


The Solution

Architecture: Rules First, AI Second

┌─────────────────────────────────────────────────────────────┐
│                    KEYWORD INPUT                            │
│              (e.g., "buy nike shoes cheap")                 │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                 DEDUPLICATION LAYER                         │
│  - Normalize whitespace, case                               │
│  - Check canonical cache (already classified?)              │
│  - Skip duplicates within batch                             │
└─────────────────────────────────────────────────────────────┘

                    ┌─────────┴─────────┐
                    │    Cache Hit?     │
                    └─────────┬─────────┘
                        Yes / │ \ No
                            ▼   ▼
            ┌───────────────┐   ┌───────────────────────────────┐
            │ Return cached │   │      RULES ENGINE             │
            │   category    │   │  - Pattern matching           │
            └───────────────┘   │  - Keyword lists              │
                                │  - Regex rules                │
                                └───────────────────────────────┘

                                    ┌─────────┴─────────┐
                                    │   Rule Match?     │
                                    └─────────┬─────────┘
                                        Yes / │ \ No
                                            ▼   ▼
                            ┌───────────────┐   ┌───────────────────────────────┐
                            │ Apply rule    │   │      GPT CLASSIFIER           │
                            │  category     │   │  - Few-shot prompting         │
                            └───────────────┘   │  - Taxonomy from Sheets       │
                                                │  - Batch processing           │
                                                └───────────────────────────────┘


                                    ┌─────────────────────────────────────────┐
                                    │          UPDATE CANONICAL CACHE         │
                                    │         (for future batch runs)         │
                                    └─────────────────────────────────────────┘

Why Rules First?

ApproachCost per 10K keywordsConsistencySpeed
100% GPT~$5-10VariableSlow
Rules only$0HighFast
Rules + GPT fallback<$1HighFast

Most keywords fall into predictable patterns. Rules handle 70-80% of classifications; GPT handles the edge cases.

The Hybrid Approach

  1. Rules handle the obvious (70-80% of keywords)

    • “buy nike shoes” → Transactional (rule: “buy *”)
    • “how to do seo” → Informational (rule: “how to *”)
  2. AI handles the ambiguous (20-30% of keywords)

    • Only keywords that don’t match rules go to GPT-4
    • Dramatically reduces AI costs
  3. Learning compounds

    • Once a keyword is classified, it’s cached
    • New clients benefit from prior classifications

Google Sheets Integration

Why Sheets?

  1. SEO team can edit rules without code deployments
  2. Taxonomy visible and auditable
  3. Version history built-in
  4. Collaborative editing

Sheet 1: Taxonomy

CategoryDescriptionParent
TransactionalPurchase intent keywordsIntent
InformationalResearch/learning keywordsIntent
NavigationalBrand/site search keywordsIntent

Sheet 2: Rules

PatternCategoryPriority
buy *Transactional1
how to *Informational1
* priceTransactional2

Results

Cost Economics

Before: Manual Classification

MetricValue
Analyst hourly cost~$50/hour
Hours per 10,000 keywords~5 hours
Cost per classification run~$250

After: Automated Classification

MetricValue
OpenAI API cost<$1 per run
Analyst review time~15 minutes
Total cost per run<$15

ROI: 94% cost reduction per classification task

Production Metrics

MetricValue
Time savings~50 hours/month per employee
Cost per run<$1 OpenAI API
Test scale10,000 rows × 3 columns
UsersEntire SEO team

Operational Benefits

BeforeAfter
4-6 hours to classify new client keywords15 minutes to run and review
Different analysts categorized differentlySingle source of truth
More clients = more analyst hoursMore clients = same infrastructure
Classification rules locked in codeSEO team updates rules via Sheets

Technology Stack

ComponentTechnology
ComputeGoogle Cloud Run
LLMOpenAI GPT-4
ConfigurationGoogle Sheets API
LanguagePython

Why Not Just Use ChatGPT?

Common question: “Can’t we just paste keywords into ChatGPT?”

Manual ChatGPTThis System
Copy-paste requiredFully automated
Inconsistent formattingStandardized output
No memory across runsCanonical cache
$5-10 per large batch<$1 per batch
No audit trailFull logging
One person at a timeTeam-wide access

Lessons Learned

  1. Rules beat AI for predictable patterns. 70-80% of keywords follow patterns that simple rules handle faster and cheaper than LLMs. This exemplifies a core principle: workflows with model-powered steps outperform pure agent approaches when the path is predictable and cost-sensitive.

  2. Deduplication is a multiplier. Aggressive dedup before API calls dramatically reduces costs and improves consistency.

  3. Google Sheets as config layer works. Non-technical team members can update rules without developer involvement.

  4. Batch processing is essential. Per-keyword API calls are economically unviable at scale.

  5. Start with cost constraints. Designing for <$1/run forced good architecture decisions (dedup, caching, rules-first).


Impact

This system changed how the SEO team operates:

  • Analysts focus on strategy, not data entry. 50 hours per month per employee shifted from classification to client work.
  • New client onboarding dropped from days to minutes. What took a week-long classification sprint now runs while you grab coffee.
  • The canonical cache compounds. Every new client benefits from prior classifications—a growing advantage over competitors starting fresh each time.

The ROI paid for the development in the first month.


Want to discuss AI classification?

Building a similar system for keyword tagging, content categorization, or document classification? I design cost-controlled AI systems that handle the predictable with rules and the ambiguous with LLMs—all without breaking the budget. Get in touch.

Let's Build Something

Taking on new work.

I build AI workflows and agents that actually run in production—and stick around to maintain them.

Best fit: growing companies where ops can't keep up with volume, teams who tried AI and got burned, or regulated industries where you can't afford to get it wrong.

Based in Copenhagen. Available for remote or on-site (SF, NY, London).

What to expect: I respond within a few days. If there's a fit, we'll find 30 minutes for coffee or a call.

Have a quick question? — an AI that knows my work.

Book a Call

Skip the back-and-forth. Pick a time that works for you and let's talk about your project.

Book a 30-minute call →

Send a Message

Prefer email? Drop me a note and I'll get back within a few days.