Open Source SEO Audit Tool: Script-LLM Hybrid Architecture for Professional Site Analysis
Introduction: Bridging the Gap Between Automation and Intelligence
For SEO professionals, site auditing has traditionally been a labor-intensive process requiring meticulous manual verification of dozens of technical factors. The challenge lies in distinguishing between tasks that can be automated with deterministic scripts and those requiring semantic understanding—a distinction that most existing tools fail to address adequately.
This comprehensive open-source solution introduces a revolutionary two-layer architecture that combines the reliability of Python scripts with the contextual intelligence of Large Language Models (LLMs). The result is a professional-grade SEO audit tool that eliminates the tedium of manual checks while avoiding the hallucination risks associated with pure AI-driven approaches.
The Problem: Why Traditional SEO Auditing Falls Short
SEO auditing encompasses a wide range of verification tasks, each with different requirements for accuracy and interpretation. Consider the following scenarios:
Deterministic Checks (80% of audit tasks):
- Verifying robots.txt existence and syntax compliance
- Validating sitemap.xml structure and URL coverage
- Checking canonical tag implementation
- Measuring page load performance scores
- Counting heading hierarchy (H1, H2, H3)
These tasks have clear, binary answers. A file either exists or it doesn't. A tag is either present or absent. Using an LLM for these checks introduces unnecessary risk—the model might confidently assert that a robots.txt file exists when it actually doesn't.
Semantic Judgment Tasks (20% of audit tasks):
- Evaluating whether an H1 tag semantically matches the page's keyword intent
- Assessing whether a meta description provides compelling value propositions
- Determining if internal link distribution supports user navigation goals
- Judging whether content depth adequately covers the topic
These tasks require contextual understanding that scripts alone cannot provide. An LLM excels here, interpreting nuance and intent in ways that rigid rule-based systems cannot.
The Solution: A Two-Layer Architecture
The seo-audit-skill tool implements a sophisticated dual-layer approach:
Layer 1: Python Scripts for Deterministic Verification
The foundation consists of specialized Python scripts, each responsible for a specific category of SEO checks:
- check-site.py: Handles site-level verification including robots.txt parsing (RFC 9309 compliant), sitemap.xml validation, 404 response handling (distinguishing true 404s from soft 404s and homepage redirects), URL normalization checks (HTTP to HTTPS migration, www consistency, trailing slash standardization), internationalization hreflang tag verification, JSON-LD Schema markup validation, E-E-A-T trust page detection (About, Contact, Privacy, Terms pages), and PageSpeed Insights score retrieval for both mobile and desktop.
- check-page.py: Performs page-level analysis including URL slug optimization (lowercase formatting, hyphen usage, keyword presence, stop word detection), Title tag optimization (50-60 character length, keyword positioning), Meta Description quality (120-160 characters, keyword alignment, specific value proposition clarity), H1 tag verification (single H1 presence, keyword matching, semantic intent alignment), Canonical tag validation (self-referencing correctness, redirect chain matching), image alt text completeness, word count verification (minimum 500 words for substantive content), keyword placement analysis (presence within first 100 words), heading structure evaluation (H2 quantity, H3/H2 ratio, keyword distribution), and internal link distribution assessment.
- check-schema.py: Extracts and validates JSON-LD structured data markup.
- check-pagespeed.py: Interfaces with Google PageSpeed Insights API to retrieve performance metrics.
- fetch-page.py: Retrieves raw HTML with built-in SSRF (Server-Side Request Forgery) protection.
Each script outputs structured JSON to stdout, using exit codes to indicate status: 0 for pass/warning conditions, 1 for failures. This design enables seamless integration into automated pipelines and CI/CD workflows.
Layer 2: LLM Agent for Semantic Analysis
The LLM layer activates only when semantic judgment is required. By feeding the structured JSON output from Layer 1 into the LLM, the system provides context-aware recommendations without risking factual accuracy. The LLM interprets the data, identifies patterns, and generates human-readable insights—tasks where probabilistic reasoning adds genuine value.
Comprehensive Audit Coverage: Version 1.0 Features
The current release supports over 20 distinct SEO verification categories, available in two configurations:
Basic Version (seo-audit)
Designed for rapid daily audits, this version provides essential checks:
Site-Level Verification:
- robots.txt parsing with RFC 9309 standard compliance
- sitemap.xml structure and coverage validation
- 404 handling differentiation (true 404 vs. soft 404 vs. homepage redirect)
- URL normalization (HTTP→HTTPS migration, www consistency, trailing slash standardization)
- Internationalization support via hreflang tag verification
- Schema.org JSON-LD markup validation
- E-E-A-T trust signal detection (About, Contact, Privacy, Terms pages)
- PageSpeed Insights performance scoring (mobile and desktop variants)
Page-Level Analysis:
- URL slug optimization (lowercase formatting, hyphen usage, keyword presence, stop word detection)
- Title tag optimization (50-60 character length, keyword positioning strategy)
- Meta Description quality (120-160 characters, keyword alignment, specific value proposition clarity)
- H1 tag verification (single H1 requirement, keyword matching, semantic intent alignment)
- Canonical tag validation (self-referencing correctness, redirect chain matching)
- Image alt text completeness and descriptiveness
- Word count verification (minimum 500 words for substantive content)
- Keyword placement analysis (presence within first 100 words for SEO impact)
- Heading structure evaluation (H2 quantity optimization, H3/H2 hierarchical ratio, keyword distribution)
- Internal link distribution and anchor text diversity
Full Version (seo-audit-full)
The comprehensive version includes additional advanced checks for enterprise-grade auditing requirements.
Installation and Usage: Two Flexible Approaches
The tool offers multiple installation methods to accommodate different workflows:
Method 1: Command-Line Interface (Recommended)
# Install the skill package
npx skills add JeffLi1993/seo-audit-skill
# Or install specific versions
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit-fullMethod 2: Claude Code Plugin Integration
# Add from marketplace
/plugin marketplace add JeffLi1993/seo-audit-skill
# Install the plugin
/plugin install seo-audit-skillOnce installed, auditing becomes a conversational experience:
audit this page: https://example.comThe system generates a comprehensive, structured report identifying issues, explaining why they matter, and providing actionable remediation steps.
Project Structure and Technical Implementation
The repository follows a clean, modular architecture:
seo-audit-skill/
├── seo-audit/
│ ├── SKILL.md # Skill definition and agent workflow
│ ├── references/REFERENCE.md # Field definitions and edge cases
│ ├── assets/report-template.html # HTML report output template
│ └── scripts/
│ ├── check-site.py # robots.txt + sitemap → JSON
│ ├── check-page.py # TDK + H1 + canonical + slug → JSON
│ ├── check-schema.py # JSON-LD extraction + validation → JSON
│ ├── check-pagespeed.py # PageSpeed Insights API → JSON
│ └── fetch-page.py # Raw HTML fetching with SSRF protection
└── seo-audit-full/
├── SKILL.md
├── references/REFERENCE.md
└── assets/report-template.htmlAll dependencies are minimal: pip install requests is the only requirement beyond standard Python libraries.
The Philosophy: Why This Approach Matters
In an era where AI tools proliferate rapidly, genuine expertise lies not in knowing how to use AI, but in understanding problems deeply enough to know when AI should—and should not—be applied.
The development process for this tool exemplifies this principle. Before writing a single line of code, the creator manually audited dozens of websites, identifying which checks were deterministic (suitable for scripts) and which required semantic interpretation (suitable for LLMs). This ground-up understanding enabled the design of an architecture that leverages each technology's strengths while mitigating their weaknesses.
The lesson extends beyond SEO: meaningful AI augmentation requires domain expertise first, tool selection second. Without the foundational work of understanding the problem space, AI-powered solutions risk producing mediocre results—automating the wrong things or introducing errors that wouldn't exist with traditional approaches.
Community and Contribution
This open-source project invites collaboration from the SEO and developer communities:
- Star the repository if you find it useful
- Report issues when you encounter bugs or edge cases
- Submit pull requests to contribute improvements or new features
- Share experiences and discuss SEO best practices
The tool is freely available under an open-source license, reflecting a commitment to advancing the state of SEO auditing for practitioners at all levels.
Conclusion: Automation with Intelligence
The seo-audit-skill project represents a thoughtful synthesis of deterministic automation and contextual intelligence. By respecting the boundaries between what scripts do best (factual verification) and what LLMs do best (semantic interpretation), it delivers professional-grade audits without the reliability concerns of pure AI approaches.
For SEO professionals drowning in manual audit tasks, this tool offers liberation from repetitive work while preserving the nuanced judgment that separates good SEO from great SEO. The result is more time for strategic thinking—the very work that no tool, however sophisticated, can ever automate.