Medusa provides a skill ranking system for AI agents. Developers write skills as Markdown files named SKILL.md, and Medusa scans them to assign one of nine tiers based on measurable complexity. It acts as an objective auditor, evaluating factors like content length, code blocks, step-by-step instructions, and technical terms. The score formula weights complexity at 60%, value at 30%, and keywords at 10%. Tiers range from Godlike (95+ score) to Poor (under 25), with colors from red-orange gradients to dark gray.

This addresses a gap in AI agent frameworks where skill quality lacks standardization. Instead of subjective ratings, Medusa promotes skills automatically as they improve—edit a SKILL.md file with more details or code, rescan, and watch the tier rise. A pizza cooking example illustrates: a basic "I cook pizza" description (200 characters, no code or steps) scores 15/100 complexity for Poor. Adding five varieties, steps, and code blocks pushes it to Common at 35/100. Advanced versions with 20+ varieties, 10+ code blocks, and 25+ terms reach Godlike at 85+/100.

Core features

Medusa includes these capabilities, drawn from its feature set:

  • 9-tier leveling: Automatic assignment from Godlike (95+, red-orange gradient) down to Poor (dark gray), with defined score ranges like Epic (75+, yellow) or Mythic (80+, purple).
  • Audit-based ranking: Parses SKILL.md for content length (e.g., 5966 characters), code blocks (up to 15), steps (0-25+), and terms (26 detected), yielding breakdowns like "Complexity: 80/100, Value: 90/100".
  • Automatic promotion: Rescans update ranks without manual intervention.
  • Parallel scanning: Uses Rust's Rayon library for 46% faster processing on multiple files, A/B tested.
  • Fusion detection: Identifies similar skills by name and content overlap.

Additional outputs include JSON for agent integration and HTML reports with dark-themed visualizations, progress bars, and color-coded tiers.

Getting it running

Medusa, version 0.11 and written in Rust, installs in one command on supported platforms. It has 15 GitHub stars at https://github.com/jtshow/Medusa.

For Windows (native):

irm https://raw.githubusercontent.com/jtshow/medusa/main/install.ps1 | iex

For macOS or Linux:

curl -sSL https://raw.githubusercontent.com/jtshow/medusa/main/install.sh | bash

To build from source on any platform, first install Rust from https://rustup.rs, then:

git clone https://github.com/jtshow/medusa.git
cd medusa
cargo build --release

Place skills in a directory with SKILL.md files, one per skill (e.g., /path/to/skills/ai-ml/SKILL.md). Basic commands follow.

Scan a directory for JSON output:

# macOS/Linux
medusa scan /path/to/skills

# Windows (adjust path)
& "C:\path\to\medusa.exe" scan C:\path\to\skills

Audit a single skill:

medusa audit /path/to/skills/ai-ml

Sample output:

=== Medusa Skill Audit Report ===

Skill: ai-ml (ai-ml), level: Godlike
  Experience: 100.0/100
  Confidence: 75%
  Metrics:
    - Content Length: 5966 chars
    - Code Blocks: 15
    - Step Instructions: 0
    - Technical Terms: 26
    - Complexity Score: 80.0/100
    - Value Score: 90.0/100

Generate an HTML report:

medusa html /path/to/skills report.html

This creates a dark-themed page with tier colors, progress bars, and skill details. It opens automatically in the browser.

Who this is for

Medusa targets developers building AI agents that rely on SKILL.md files for tool definitions. Frameworks like Hermes (add as a tool in .hermes/config.yaml), OpenClaw (via Python subprocess), or ClaudeCode/Codex (shell medusa scan <path>) integrate its JSON output directly. Use cases include ranking agent capabilities during development—scan a folder of 10-50 skills to spot weak ones (Common or below) needing more code blocks or terms.

Agent teams can track improvements over time: start with basic prompts, iterate to Ultra Rare or higher by adding techniques and examples, like the pizza skill's progression from 200 to 6000+ characters. Fusion detection helps clean duplicates across repositories. It's suited for self-hosted setups where agents need quantified skill quality, not just presence.

Small projects fit best; with Rayon parallelism, it handles dozens of files quickly, but large-scale (1000+) might require batching. If your workflow involves Markdown-based agent skills, run audits pre-deployment to ensure no Poor tiers slip through.

Comparisons

Medusa stands out as the first audit-based system for SKILL.md ranking—no direct competitors parse complexity this way. General Markdown analyzers like remark-lint check syntax but ignore agent-specific metrics like steps or value scores. AI evaluation tools (e.g., LangChain's evaluators) focus on runtime performance, not static file audits.

It's lighter than full agent frameworks, at Rust binary size (build with cargo build --release), and outputs pure JSON for easy piping. Drawbacks include Rust dependency for source builds and focus solely on SKILL.md—won't scan other formats. For broader docs, tools like Vale or Write-good offer style checks but lack tiered promotion or visualizations.

Rust's performance edge shows in claims like 46% faster scanning versus single-threaded alternatives. At 15 stars, it's early-stage; production users might watch for v1.0 stability.

Medusa fills a niche for SKILL.md-driven agents but skips users without that format or needing runtime benchmarks. Source at https://github.com/jtshow/Medusa.