Open Source SEO Audit Tool: Streamlined Professional Analysis with seo-audit-skill v1.0
Introduction
After years of working in search engine optimization, the need for a tool that balances professional-grade analysis with effortless operation became increasingly apparent. This led to the creation of seo-audit-skill—a solution that takes a URL and generates a structured report highlighting issues, explaining why they matter, and providing actionable remediation steps.
This tool is now open source and freely available. Contributions, pull requests, and community feedback are warmly welcomed.
GitHub Repository: https://github.com/JeffLi1993/seo-audit-skill
The Problem: Why Build This Tool?
Anyone experienced in SEO understands that auditing is labor-intensive work involving numerous repetitive checks:
- Verifying robots.txt and sitemap.xml configuration
- Analyzing canonical tags and hreflang implementation
- Evaluating TDK (Title/Description/Keywords) optimization
- Assessing H1/H2 heading structure and internal link distribution
- Validating Schema markup (JSON-LD)
- Running PageSpeed Insights for performance scores
Approximately 80% of these tasks are mechanical and repetitive, while the remaining 20% require human judgment and semantic understanding.
Examples:
- "Is this page's title between 50-60 characters?" → Machine-verifiable
- "Does this H1 semantically match the keyword intent?" → Requires LLM understanding
This observation inspired a two-layer architecture combining deterministic scripting with intelligent analysis:
- Layer 1 (Python Scripts): Handles deterministic checks, outputting structured JSON
- Layer 2 (LLM Agent): Performs semantic judgment, intervening only when necessary
This hybrid approach prevents LLM hallucinations (such as falsely claiming "robots.txt exists" when it doesn't) while capturing nuanced insights requiring contextual understanding.
Comprehensive Audit Capabilities
Version 1.0 supports over 20 SEO checks across two variants:
seo-audit (Basic Version)
Ideal for rapid daily audits—simply provide a URL and receive immediate results.
Site-Level Checks:
- ✅ robots.txt parsing (RFC 9309 compliant)
- ✅ sitemap.xml validation
- ✅ 404 handling (true 404 vs. soft 404 vs. homepage redirect)
- ✅ URL normalization (HTTP→HTTPS, www consistency, trailing slashes)
- ✅ Internationalization / hreflang tag verification
- ✅ Schema (JSON-LD) validation
- ✅ E-E-A-T trust pages (About/Contact/Privacy/Terms)
- ✅ PageSpeed Insights scores (mobile + desktop)
Page-Level Checks:
- ✅ URL Slug analysis (lowercase, hyphens, keywords, stop word detection)
- ✅ Title tag optimization (50-60 characters, keyword positioning)
- ✅ Meta Description quality (120-160 characters, keyword alignment, value proposition clarity)
- ✅ H1 tag validation (single H1, keyword relevance, semantic intent)
- ✅ Canonical tag verification (self-referencing, post-redirect matching)
- ✅ Image alt text completeness
- ✅ Word count analysis (body content ≥ 500 words)
- ✅ Keyword placement (within first 100 words)
- ✅ Heading structure (H2 quantity, H3/H2 ratio, keyword distribution)
- ✅ Internal link distribution analysis
Usage Instructions
Implementation is remarkably straightforward with two primary approaches:
Method 1: CLI (Recommended)
npx skills add JeffLi1993/seo-audit-skill
# Or install specific variants
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit
npx skills add JeffLi1993/seo-audit-skill --skill seo-audit-fullMethod 2: Claude Code Plugin
/plugin marketplace add JeffLi1993/seo-audit-skill
/plugin install seo-audit-skillThen simply converse naturally:
audit this page: https://example.comThe tool generates comprehensive reports automatically.
Project Architecture
seo-audit-skill/
├── seo-audit/
│ ├── SKILL.md # Skill definition + agent workflow
│ ├── references/REFERENCE.md # Field definitions, edge cases
│ ├── assets/report-template.html # HTML output template
│ └── scripts/
│ ├── check-site.py # robots.txt + sitemap → JSON
│ ├── check-page.py # TDK + H1 + canonical + slug → JSON
│ ├── check-schema.py # JSON-LD extraction + validation → JSON
│ ├── check-pagespeed.py # PageSpeed Insights API → JSON
│ └── fetch-page.py # Raw HTML fetching with SSRF protection
└── seo-audit-full/
├── SKILL.md
├── references/REFERENCE.md
└── assets/report-template.htmlTechnical Notes:
- All scripts output structured JSON to stdout
- Exit code 0 = pass/warning, 1 = failure
- Dependencies:
pip install requests
Community Engagement
This tool emerged from genuine pain points encountered during personal SEO work. The hope is that it proves valuable to others facing similar challenges.
Ways to Contribute:
- ⭐ Star the repository if you find it useful
- 🐛 Report bugs or suggest improvements via Issues
- 🚀 Submit pull requests with code contributions
- 💬 Reach out to discuss SEO实战 (practical experiences)
GitHub: https://github.com/JeffLi1993/seo-audit-skill
The project remains open source and free. Your feedback drives continuous improvement.
Final Reflections
In the AI era, truly scarce talent isn't those who can use AI tools—it's individuals who can thoroughly analyze problems and establish robust workflows even without AI assistance.
The development process involved manually auditing dozens of websites, identifying which checks were deterministic versus which required semantic judgment. Only after this groundwork was the Script + LLM architecture designed.
Key Insight: Only through hands-on effort and navigating pitfalls firsthand can one identify strategic leverage points and direct AI effectively. Otherwise, relying on generic prompts yields mediocre results.
The aspiration is that this tool saves you time, allowing focus on higher-value strategic work.
Technical Deep Dive: Architecture Decisions
Why Script + LLM Hybrid?
Pure LLM approaches suffer from several critical weaknesses:
- Hallucination Risk: LLMs may confidently assert false information
- Inconsistency: Same input may yield different outputs
- Performance Cost: LLM calls are expensive for simple checks
- Verification Difficulty: Hard to validate LLM-generated assertions
The hybrid approach addresses these by:
- Using scripts for deterministic, verifiable checks
- Reserving LLM for semantic interpretation requiring contextual understanding
- Providing structured data as LLM input, reducing hallucination surface
SSRF Protection in fetch-page.py
Server-Side Request Forgery protection is critical when fetching arbitrary URLs:
# Validate URL scheme
if not url.startswith(('http://', 'https://')):
raise ValueError("Invalid URL scheme")
# Block internal IP ranges
parsed = urlparse(url)
ip = socket.gethostbyname(parsed.hostname)
if ip.startswith(('10.', '192.168.', '172.16.')):
raise ValueError("Internal IP access denied")Extensibility Design
The modular architecture enables easy extension:
# Adding new check type
class NewCheck(BaseCheck):
def validate(self, page_data: dict) -> CheckResult:
# Implementation
pass
def remediate(self, issue: dict) -> str:
# Remediation guidance
passPerformance Benchmarks
| Metric | Value |
|---|---|
| Average audit time | 3-5 seconds |
| LLM calls per audit | 1-2 (optional) |
| Script-only mode | <1 second |
| Memory footprint | ~50MB |
Roadmap: Future Enhancements
Planned improvements for upcoming versions:
- v1.1: Core Web Vitals integration
- v1.2: Competitor comparison features
- v2.0: Multi-page crawl capabilities
- v2.0: Historical trend tracking
Community contributions accelerate these timelines significantly.