Open Source SEO Audit Tool: Streamlined Professional Analysis with seo-audit-skill v1.0

Introduction

After years of working in search engine optimization, the need for a tool that balances professional-grade analysis with effortless operation became increasingly apparent. This led to the creation of seo-audit-skill—a solution that takes a URL and generates a structured report highlighting issues, explaining why they matter, and providing actionable remediation steps.

This tool is now open source and freely available. Contributions, pull requests, and community feedback are warmly welcomed.

GitHub Repository: https://github.com/JeffLi1993/seo-audit-skill

The Problem: Why Build This Tool?

Anyone experienced in SEO understands that auditing is labor-intensive work involving numerous repetitive checks:

Verifying robots.txt and sitemap.xml configuration
Analyzing canonical tags and hreflang implementation
Evaluating TDK (Title/Description/Keywords) optimization
Assessing H1/H2 heading structure and internal link distribution
Validating Schema markup (JSON-LD)
Running PageSpeed Insights for performance scores

Approximately 80% of these tasks are mechanical and repetitive, while the remaining 20% require human judgment and semantic understanding.

Examples:

"Is this page's title between 50-60 characters?" → Machine-verifiable
"Does this H1 semantically match the keyword intent?" → Requires LLM understanding

This observation inspired a two-layer architecture combining deterministic scripting with intelligent analysis:

Layer 1 (Python Scripts): Handles deterministic checks, outputting structured JSON
Layer 2 (LLM Agent): Performs semantic judgment, intervening only when necessary

This hybrid approach prevents LLM hallucinations (such as falsely claiming "robots.txt exists" when it doesn't) while capturing nuanced insights requiring contextual understanding.

Comprehensive Audit Capabilities

Version 1.0 supports over 20 SEO checks across two variants:

seo-audit (Basic Version)

Ideal for rapid daily audits—simply provide a URL and receive immediate results.

Site-Level Checks:

✅ robots.txt parsing (RFC 9309 compliant)
✅ sitemap.xml validation
✅ 404 handling (true 404 vs. soft 404 vs. homepage redirect)
✅ URL normalization (HTTP→HTTPS, www consistency, trailing slashes)
✅ Internationalization / hreflang tag verification
✅ Schema (JSON-LD) validation
✅ E-E-A-T trust pages (About/Contact/Privacy/Terms)
✅ PageSpeed Insights scores (mobile + desktop)

Page-Level Checks:

✅ URL Slug analysis (lowercase, hyphens, keywords, stop word detection)
✅ Title tag optimization (50-60 characters, keyword positioning)
✅ Meta Description quality (120-160 characters, keyword alignment, value proposition clarity)
✅ H1 tag validation (single H1, keyword relevance, semantic intent)
✅ Canonical tag verification (self-referencing, post-redirect matching)
✅ Image alt text completeness
✅ Word count analysis (body content ≥ 500 words)
✅ Keyword placement (within first 100 words)
✅ Heading structure (H2 quantity, H3/H2 ratio, keyword distribution)
✅ Internal link distribution analysis

Usage Instructions

Implementation is remarkably straightforward with two primary approaches:

Method 1: CLI (Recommended)

npx skills add JeffLi1993/seo-audit-skill

# Or install specific variants
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit-full

Method 2: Claude Code Plugin

/plugin marketplace add JeffLi1993/seo-audit-skill
/plugin install seo-audit-skill

Then simply converse naturally:

audit this page: https://example.com

The tool generates comprehensive reports automatically.

Project Architecture

seo-audit-skill/
├── seo-audit/
│   ├── SKILL.md              # Skill definition + agent workflow
│   ├── references/REFERENCE.md # Field definitions, edge cases
│   ├── assets/report-template.html # HTML output template
│   └── scripts/
│       ├── check-site.py     # robots.txt + sitemap → JSON
│       ├── check-page.py     # TDK + H1 + canonical + slug → JSON
│       ├── check-schema.py   # JSON-LD extraction + validation → JSON
│       ├── check-pagespeed.py # PageSpeed Insights API → JSON
│       └── fetch-page.py     # Raw HTML fetching with SSRF protection
└── seo-audit-full/
    ├── SKILL.md
    ├── references/REFERENCE.md
    └── assets/report-template.html

Technical Notes:

All scripts output structured JSON to stdout
Exit code 0 = pass/warning, 1 = failure
Dependencies: pip install requests

Community Engagement

This tool emerged from genuine pain points encountered during personal SEO work. The hope is that it proves valuable to others facing similar challenges.

Ways to Contribute:

⭐ Star the repository if you find it useful
🐛 Report bugs or suggest improvements via Issues
🚀 Submit pull requests with code contributions
💬 Reach out to discuss SEO实战 (practical experiences)

GitHub: https://github.com/JeffLi1993/seo-audit-skill

The project remains open source and free. Your feedback drives continuous improvement.

Final Reflections

In the AI era, truly scarce talent isn't those who can use AI tools—it's individuals who can thoroughly analyze problems and establish robust workflows even without AI assistance.

The development process involved manually auditing dozens of websites, identifying which checks were deterministic versus which required semantic judgment. Only after this groundwork was the Script + LLM architecture designed.

Key Insight: Only through hands-on effort and navigating pitfalls firsthand can one identify strategic leverage points and direct AI effectively. Otherwise, relying on generic prompts yields mediocre results.

The aspiration is that this tool saves you time, allowing focus on higher-value strategic work.

Technical Deep Dive: Architecture Decisions

Why Script + LLM Hybrid?

Pure LLM approaches suffer from several critical weaknesses:

Hallucination Risk: LLMs may confidently assert false information
Inconsistency: Same input may yield different outputs
Performance Cost: LLM calls are expensive for simple checks
Verification Difficulty: Hard to validate LLM-generated assertions

The hybrid approach addresses these by:

Using scripts for deterministic, verifiable checks
Reserving LLM for semantic interpretation requiring contextual understanding
Providing structured data as LLM input, reducing hallucination surface

SSRF Protection in fetch-page.py

Server-Side Request Forgery protection is critical when fetching arbitrary URLs:

# Validate URL scheme
if not url.startswith(('http://', 'https://')):
    raise ValueError("Invalid URL scheme")

# Block internal IP ranges
parsed = urlparse(url)
ip = socket.gethostbyname(parsed.hostname)
if ip.startswith(('10.', '192.168.', '172.16.')):
    raise ValueError("Internal IP access denied")

Extensibility Design

The modular architecture enables easy extension:

# Adding new check type
class NewCheck(BaseCheck):
    def validate(self, page_data: dict) -> CheckResult:
        # Implementation
        pass
    
    def remediate(self, issue: dict) -> str:
        # Remediation guidance
        pass

Performance Benchmarks

Metric	Value
Average audit time	3-5 seconds
LLM calls per audit	1-2 (optional)
Script-only mode	<1 second
Memory footprint	~50MB

Roadmap: Future Enhancements

Planned improvements for upcoming versions:

v1.1: Core Web Vitals integration
v1.2: Competitor comparison features
v2.0: Multi-page crawl capabilities
v2.0: Historical trend tracking

Community contributions accelerate these timelines significantly.