A developer preparing an AI skill for Cursor, wondering whether the documentation, target audience and technical constraints are solid enough for public release.

The approach

SkillLens reframes a skill as a potential publishable AI product rather than a static markdown file. It evaluates the artifact along three orthogonal dimensions: the perspective from which the skill is judged, the depth of the analysis, and the tool that invokes the evaluation. The core of the system is a 100‑point rubric with 34 sub‑dimensions that are automatically enabled or disabled based on whether the input is atomic, pipeline or composite, ensuring that each structural form receives dedicated criteria. A finance expert mode can be layered on top of the generic score, adding domain‑specific metrics for risk, maturity and commercial readiness. The resulting report combines a total score, a letter grade, evidence cards that show pass/fail status, and a ranked list of actionable improvements, all of which can be exported in multiple formats and toggled between Chinese and English.

What you actually get

  • Overall score and grade – a 100‑point total with an S/A/B/C/D rating derived from the rubric.
  • Evidence cards – each check is marked as pass, partial, fail or not applicable, with citations that justify the decision.
  • Top improvement suggestions – a prioritized list that reflects the weight of each rubric pillar, ready for immediate implementation.
  • Finance expert tab – when enabled, the report includes a risk level and commercial maturity assessment specific to finance scenarios.
  • Market signal hints – brief pointers based on GitHub search results that surface comparable skills and potential alternatives.
  • Export options – JSON for programmatic use, a single‑file HTML that can be printed or shared, and a GitHub‑flavored Markdown version.
  • Language toggle – the UI and the report content switch between Chinese and English without extra steps.

What it doesn’t do

SkillLens does not execute the skill with a real language model unless the Deep Review step is launched, so runtime performance, cost and scalability remain untested. It does not automatically pull a repository from GitHub; the web uploader only accepts local files or zip archives, requiring a manual clone for remote sources. The system assumes the presence of an LLM provider key for full Deep Review, meaning that mock mode can only preview UI elements. Finally, it focuses on structural and documentation quality rather than measuring the actual functional correctness of the skill in production.

Trying it out

Setting up the web interface needs a .env file with model API keys and a simple npm run dev command, while the CLI operates directly from the repository without any key configuration. See the README link for the exact commands.

The project is most useful for teams that need a transparent, quantitative audit of an AI skill before publishing, sharing internally, or submitting to a marketplace; it is less suited for developers seeking only runtime benchmarking or full CI integration. Alternatives such as manual review or generic LLM evaluation tools may cover some aspects, but none combine rubric‑driven scoring, multi‑dimensional lens selection and exportable, actionable reports in a single self‑hosted package. The source is on GitHub.