The verification layer for the agentic coding era.
AI ships code in minutes — verifying it hasn't. testsprite opens your live app, uses it like a real user, and shows your coding agent exactly what broke — so it fixes its own work before a bug ever reaches you.
Proof, in public: verification beats model size. With this CLI in the loop, the cheapest model in the field shipped the most correct app on an open leaderboard — 89%, at half the cost of the priciest one. See the leaderboard →
⭐ Help us reach more developers and grow the TestSprite community. Star this repo!
https://github.com/user-attachments/assets/eca90a91-93ef-49f6-8d13-86b4eb25f4cf
▶ Watch the launch video — the three hard limits every coding agent hits, and the loop that breaks them (4 min).
What is it?
TestSprite is the AI testing platform 100,000+ teams use to test their software, frontend and backend — in the cloud, against the live product, not mocks. This repo is its official CLI.
It puts that platform in your coding agent's hands: the agent verifies every behavior it ships, and what broke comes back as one self-consistent bundle it can act on — no dashboard scraping. Humans drive the same surface from a terminal or CI.
⭐️ Star the Repository
If you find testsprite useful, a GitHub Star ⭐️ would be greatly appreciated — it helps other builders (and their agents) find the project, and stars notify you about new releases.
Quickstart
Requires Node.js ≥ 20. (No global install? npx @testsprite/testsprite-cli works too.)
npm install -g @testsprite/testsprite-cli
testsprite init
testsprite init prompts for your API key, verifies it, and installs the verification-loop skill for your coding agent (claude, cursor, cline, antigravity, codex, etc.). Non-interactive (CI / onboarding scripts):
TESTSPRITE_API_KEY=sk-... testsprite init --from-env --yes --agent claude
From there, the loop runs on its own — an example session, typed by the coding agent:
testsprite test create --project proj_8f0f6 --type frontend \
--plan-from ./checkout-flow.plan.json --run --wait --output json
# → exits 1: the run failed
# 2 — pull ONE self-consistent failure bundle
testsprite test failure get test_3a9f21c7 --out ./.testsprite/failure
# 3 — the agent reads the bundle, fixes the code, then replays
testsprite test rerun test_3a9f21c7 --wait --output json
# → exits 0: passed. The test now lives in your durable suite.
Prefer to configure each step by hand (or learn the surface offline with --dry-run first)? See Manual setup and Install & verify.
Commands
| Group | Command | What it does |
|---|---|---|
| Init | init |
One-shot onboarding: auth configure + whoami + agent install |
| Auth | auth configure |
Store an API key at ~/.testsprite/credentials |
auth whoami (alias status) |
Resolve the active profile to its user, key, env, and scopes | |
auth logout |
Remove the active profile from the credentials file | |
| Read | project list / project get |
List projects / fetch one by id |
test list / test get |
List tests under a project / fetch one by id | |
test code get |
Print (or write) the generated test source | |
test steps |
List the latest run's steps with screenshot / DOM pointers | |
test result |
Latest result; --history lists a test's prior runs |
|
test failure get |
The agent entry point: one self-contained latest-failure bundle | |
test failure summary |
One-screen triage card (no media download) | |
| Write | test create / test create-batch |
Create a test (or bulk-create from a plan file); --produces / --needs / --category wire BE dependency metadata |
test update / test delete / test delete-batch |
Edit metadata / soft-delete | |
test code put |
Replace generated code (etag-guarded) | |
test plan put |
Replace a frontend test's plan-steps | |
project create / project update |
Manage projects | |
| Run | test run |
Trigger a fresh run; --wait blocks until terminal; --all --project <id> runs all tests in a project in wave order |
test rerun |
Cheap replay of one/many tests (FE verbatim; BE with deps); --all --project <id> reruns all tests |
|
test wait |
Block on a runId until terminal |
|
test artifact get |
Download the failure bundle for a specific runId |
|
| Agent | agent install / agent list |
Onboard a coding agent (pure-local); targets: claude, codex, cursor, cline, antigravity |
📚 Full reference — every command, flag, and example: DOCUMENTATION.md, including configuration & profiles, scripting, and exit codes.
Why a CLI for coding agents?
- 🧪 Tests like a real user. Runs against a live browser or API in the cloud — real clicks, real navigation, real assertions. Not a mock.
- 🤖 Agent-shaped output.
test failure getreturns one bundle — the failing step, its neighbors, screenshots, DOM snapshots, the test source, a root-cause hypothesis, and a recommended fix target — all sharing a singlesnapshotId. The CLI refuses to stitch data from two different runs, so an agent never reasons over a frankenstein context. - ♻️ A loop, not a one-shot.
create → run → failure get → fix → rerun— every pass is banked, not thrown away. - 📐 Scriptable & deterministic. Stable
--output jsoncontract, predictable exit codes, and a--dry-runthat exercises the full code path offline with canned data. - 🔌 One command to onboard your agent.
testsprite agent install claudedrops a ready-made skill file into your repo so your coding agent knows how to drive the loop on its own.
How it works
Every time your agent changes code, it asks one question: is this behavior already covered by the suite?
- Not yet covered →
testsprite test create— describe the new behavior, run it. - Already covered →
testsprite test rerun— replay the existing tests, so nothing that used to work breaks silently. - Something fails →
testsprite test failure get— one self-consistent bundle; the agent fixes the code and reruns.
Every pass is banked into a durable suite, so coverage compounds as the project grows — a lasting record of every requirement it has ever gotten right, far bigger than any context window.
flowchart TD
A["🤖 Your coding agent<br/>Claude Code · Codex · Antigravity · Kimi · Cursor · Trae …"]
D{"behavior already<br/>covered by the suite?"}
B["<b>testsprite test create</b><br/>new behavior → new test"]
R["<b>testsprite test rerun</b><br/>replay the existing tests"]
C{{"☁️ TestSprite testing agent<br/>runs the test like a real user against<br/>real browsers & real APIs on Cloud"}}
F["<b>testsprite test failure get</b><br/>ONE self-consistent bundle:<br/>failing step · screenshots · DOM ·<br/>root-cause · recommended fix"]
S[("📚 Durable integration suite<br/>grows with every pass")]
A -->|"writes / changes code"| D
D -->|"no — new behavior"| B
D -->|"yes"| R
B --> C
R --> C
C -->|"pass ✅"| S
C -->|"fail ❌"| F
F -->|"agent reads the bundle<br/>& fixes the code"| A
S -.->|"defines what's covered"| D
The cloud is a black box on purpose: your agent describes intent and reads results. It never has to know how the test was driven — only what a real user experienced.
Proved in public
On CoderCup — an open leaderboard where frontier coding agents build the same app under the same rules, with TestSprite as the referee — the cheapest model in the field shipped the most correct app on the board: 89%, at half the cost of the priciest one.
That's the point of all of this: you no longer need the biggest, most expensive model to ship software you can trust — top-tier quality, without paying top-tier prices, within reach of every team.
Getting help
- 📚 CLI reference — DOCUMENTATION.md
- 🌐 Platform docs — testsprite.com/docs
- 🐛 Issues & feature requests — GitHub issues
- 💬 Quick questions — Discord, or
testsprite --help/testsprite test run --helpright in your terminal - 📝 Changelog — CHANGELOG.md
Comments