Train. Break. Defend. AI Systems.
An open-source platform for AI security training, red/blue teaming, CTF, benchmarking, and research. Runs 100% locally. No cloud, no paid APIs, no data leaves your machine.
Prerequisites
- Docker 24+ and Docker Compose v2
- 16 GB RAM minimum (Ollama loads models into memory; 8 GB works for small models only)
- 20 GB free disk space for base images and at least one model
- x86-64 CPU; Apple Silicon and ARM64 are supported but untested
GPU is optional. Ollama runs on CPU but inference will be slow without one.
Quick Start
git clone https://github.com/sonuoffsec/DVAP
cd DVAP
cp .env.example .env
docker compose up -d
Open http://localhost:8080 once all containers are healthy. First run takes 30-60 seconds.
What is DVAP?
DVAP is an open-source AI security research, training, benchmarking, and red teaming platform designed to help security professionals, AI engineers, researchers, students, and organizations understand how modern AI systems fail and how to defend them.
Built for the AI era, DVAP provides intentionally vulnerable AI applications, agents, RAG systems, MCP integrations, and domain-specific environments that can be attacked, analyzed, benchmarked, and secured.
Unlike cloud-based AI playgrounds, DVAP runs entirely on your machine.
No cloud. No subscriptions. No API costs. No vendor lock-in.
Why DVAP?
Modern AI applications introduce entirely new attack surfaces:
- Prompt Injection
- Memory Poisoning
- RAG Poisoning
- Tool Abuse
- MCP Exploitation
- Multi-Agent Attacks
- Autonomous Agent Manipulation
- Data Exfiltration
- Identity and Trust Failures
- AI Supply Chain Risks
Yet there is no single platform that allows researchers to safely learn, practice, benchmark, and validate these attacks in one place.
DVAP aims to become the definitive open-source platform for AI security education, research, and experimentation.
How DVAP Differs
| DVAP | DVWA | HackTheBox | Gandalf (Lakera) | Blog Posts / Papers | |
|---|---|---|---|---|---|
| AI-specific vulnerabilities | Yes | No | Partial | Partial | Yes (theory only) |
| Local, no cloud | Yes | Yes | No | No | N/A |
| 15 dedicated AI labs | Yes | No | No | No | No |
| LLM benchmark engine | Yes | No | No | No | No |
| CTF with flags | Yes | Yes | Yes | No | No |
| OWASP LLM Top 10 coverage | Full | No | Partial | Partial | Varies |
| MITRE ATLAS mapping | Yes | No | No | No | Varies |
| Report generation | Yes | No | No | No | No |
| Research workspace | Yes | No | No | No | No |
| Agent and MCP security | Yes | No | No | No | No |
| Free and open source | Yes | Yes | Partial | No | Yes |
DVAP is one of the first platforms to combine hands-on AI attack labs, local LLM benchmarking, CTF challenges, and professional reporting in a single self-hosted environment.
Key Features
AI Security Labs 15 intentionally vulnerable labs covering real-world AI attack techniques.
Research Workspace Inspect prompts, memory, tool calls, retrieved documents, agent actions, and attack chains.
Security Benchmarking Evaluate local and external models against AI security attack suites.
Capture The Flag (CTF) Learn AI security through guided challenges, flags, hints, and walkthroughs.
Reporting Engine Generate professional findings and benchmark reports mapped to OWASP LLM Top 10, MITRE ATLAS, CWE, and CVSS.
100% Local Run everything on your own machine. Your prompts, data, findings, and experiments never leave your environment.
Platform Overview

๐ฌ Demo
๐ธ Screenshots
๐ฅ๏ธ Research Command Center
Research Command Center ยท Live platform activity, model benchmark leaderboard, flag capture progress, and service health in a single operational view.
๐งฉ Platform Modules
โ๏ธ AI Security Labs 15 containerized vulnerable AI environments across every major attack class |
๐ฉ CTF Challenges Flags, hints, and walkthroughs mapped to OWASP LLM Top 10 and MITRE ATLAS |
๐ Benchmark Center Evaluate local LLMs against prompt injection, jailbreak, and data exfiltration suites |
๐ฌ Research Workspace Full trace recording of prompts, memory, tool calls, and agent behavior |
๐ก๏ธ Findings Dashboard
Findings Dashboard ยท Track, triage, and document security findings with severity ratings, OWASP LLM Top 10 mapping, MITRE ATLAS techniques, and export-ready reports.
Labs
15 containerized labs, each with flags, hints, walkthrough, and OWASP LLM Top 10 + MITRE ATLAS mapping.
| Lab | Difficulty | OWASP LLM | MITRE ATLAS |
|---|---|---|---|
| Prompt Injection | Beginner | LLM01 | AML.T0051, AML.T0054 |
| Memory Poisoning | Intermediate | LLM02 | AML.T0054 |
| RAG Poisoning | Intermediate | LLM02, LLM03 | AML.T0020, AML.T0043 |
| Tool Output Injection | Intermediate | LLM07 | AML.T0054, AML.T0068 |
| MCP Security | Advanced | LLM07 | AML.T0068 |
| Browser Agent Security | Advanced | LLM07, LLM09 | AML.T0054 |
| Multi-Agent Security | Advanced | LLM08 | AML.T0054 |
| Autonomous Agent Security | Advanced | LLM08, LLM09 | AML.T0054 |
| Data Exfiltration | Advanced | LLM06 | AML.T0057, AML.T0058 |
| Agent Identity and Trust Abuse | Advanced | LLM08 | AML.T0058 |
| AI Banking Platform | Intermediate | LLM01, LLM06 | AML.T0043 |
| AI Healthcare Environment | Advanced | LLM01, LLM06 | AML.T0043 |
| Multi-Tenant AI SaaS | Advanced | LLM06 | AML.T0043 |
| AI Supply Chain Security | Expert | LLM03, LLM05 | AML.T0010, AML.T0048 |
| AI Developer Platform | Expert | LLM03, LLM07 | AML.T0010, AML.T0068 |
Each lab runs in an isolated Docker container with its own Ollama-backed LLM endpoint.
OWASP LLM Top 10 Coverage
DVAP is aligned with the OWASP Top 10 for Large Language Model Applications. Each category maps to one or more dedicated labs.
| Category | Name | Labs | Status |
|---|---|---|---|
| LLM01 | Prompt Injection | Prompt Injection, AI Banking Platform, AI Healthcare Environment | Covered |
| LLM02 | Data and Memory Poisoning | Memory Poisoning, RAG Poisoning | Covered |
| LLM03 | Supply Chain and Training Data Risks | RAG Poisoning, AI Supply Chain Security, AI Developer Platform | Covered |
| LLM04 | Model Denial of Service | Planned (v1.2) | |
| LLM05 | Insecure Supply Chain | AI Supply Chain Security | Covered |
| LLM06 | Sensitive Information Disclosure | Data Exfiltration, AI Banking Platform, AI Healthcare Environment, Multi-Tenant AI SaaS | Covered |
| LLM07 | Insecure Plugin Design | Tool Output Injection, MCP Security, Browser Agent Security, AI Developer Platform | Covered |
| LLM08 | Excessive Agency | Multi-Agent Security, Autonomous Agent Security, Agent Identity and Trust Abuse | Covered |
| LLM09 | Overreliance | Browser Agent Security, Autonomous Agent Security | Covered |
| LLM10 | Model Theft | Planned (v1.2) |
8 of 10 categories covered across 15 labs. LLM04 and LLM10 are on the v1.2 roadmap.
Architecture
graph TB
User([User Browser]) --> Nginx[Nginx :8080]
Nginx --> Web[Next.js Frontend :3000]
Nginx --> API[FastAPI Backend :8000]
API --> PG[(PostgreSQL)]
API --> Redis[(Redis)]
API --> Qdrant[(Qdrant)]
API --> Sock[Docker Socket]
Sock --> L1[Lab Container]
Sock --> L2[Lab Container]
Sock --> LN[Lab Container ...]
L1 --> Ollama[Ollama :11434]
L2 --> Ollama
LN --> Ollama
subgraph dvap-internal network
Web
API
PG
Redis
Qdrant
end
subgraph dvap-labs network
L1
L2
LN
Ollama
end
Lab containers are isolated on a separate Docker network. They can reach Ollama for LLM inference but cannot reach the database, Redis, or Qdrant.
Security Architecture
Network Isolation
Two Docker networks keep lab traffic separate from platform infrastructure:
dvap-internal(172.20.0.0/24) - PostgreSQL, Redis, Qdrant, API, frontend, Nginxdvap-labs(172.21.0.0/24) - lab containers and Ollama
Lab containers can reach Ollama and nothing else on the internal network. They cannot reach PostgreSQL, Redis, or Qdrant.
Docker Socket Access
Known tradeoff: The API container mounts /var/run/docker.sock to spawn and stop lab containers on demand (Docker-out-of-Docker). This grants the API process root-equivalent access to the host Docker daemon.
This is an intentional design decision. DVAP is a local single-user install for security research and training, not a multi-tenant service. The tradeoff is accepted because:
- There is no network-accessible admin interface that could trigger arbitrary container operations
- Lab container resource limits (512 MB RAM, 0.5 CPU) prevent resource exhaustion
- Lab images are built from controlled Dockerfiles in this repository
If you are deploying DVAP in a shared or networked environment, replace the socket mount with a rootless Docker socket or Podman socket (/run/user/1000/podman/podman.sock) and restrict API network access accordingly.
Instance TTL
Lab instances stop automatically after 1 hour via Redis TTL keys. Call POST /api/v1/instances/cleanup to trigger early cleanup.
Rate Limiting
Flag submissions are rate-limited to 15 attempts per 60-second window per session token.
Services
| Service | Port (internal) | Purpose |
|---|---|---|
| PostgreSQL | 5432 | Primary datastore |
| Redis | 6379 | Rate limiting, instance TTL |
| Qdrant | 6333 | Semantic search over findings |
| Ollama | 11434 | Local LLM inference |
| API | 8000 | FastAPI backend |
| Web | 3000 | Next.js frontend |
| Nginx | 8080 (host) | Reverse proxy |
Environment Variables
See .env.example for all variables. Key ones to change before any networked deployment:
SECRET_KEY= # strong random value for HMAC signing
POSTGRES_PASSWORD= # change from the default
REDIS_PASSWORD= # change from the default
Upgrading
One command brings your install up to date:
make upgrade
This runs git pull then rebuilds and restarts all containers. Database migrations run automatically on every API container start.
Without make:
git pull
docker compose up -d --build
What upgrades automatically
- Backend code and API endpoints
- Frontend dashboard
- Database schema (Alembic migrations)
- Lab definitions
What is never touched
- Your data (findings, research sessions, benchmark results, campaigns)
- Your
.envfile
Check for new environment variables
diff .env .env.example
Development vs Production
# Development (default) - auto-loads docker-compose.override.yml
# Hot reload for API and frontend, source mounted as volumes
docker compose up -d
# Production - baked images, no volume mounts, 4 uvicorn workers
docker compose -f docker-compose.yml up -d
Build images before the production run:
docker build -t dvap-api:latest --target production ./backend
docker build -t dvap-web:latest --target production ./frontend
Running Tests
Tests require a PostgreSQL instance. Start the stack first:
docker compose up -d postgres
export TEST_DATABASE_URL=postgresql+asyncpg://dvap:<your-postgres-password>@localhost:5432/dvap_test
cd backend
pip install -e ".[dev]"
pytest
Roadmap
v1.1 - Expanded Lab Coverage
- LangChain agent security lab
- CrewAI multi-agent security lab
- LlamaIndex RAG security lab
- AutoGPT-style autonomous agent lab
v1.2 - Enhanced Benchmarking
- Support for OpenAI and Anthropic API models
- Custom benchmark suite builder
- Benchmark comparison reports across model versions
- Automated regression testing for model security
v1.3 - Platform Improvements
- Multi-user support with session isolation
- VS Code extension for research workspace
- GitHub Actions integration for CI/CD security testing
- Lab difficulty progression system
v2.0 - AI Red Team Automation
- Automated attack chain generation
- AI-assisted vulnerability discovery
- Red team campaign templates
- Integration with popular security tools (Burp Suite, Metasploit)
Want to contribute to the roadmap? Open an issue or start a discussion.
Governance
DVAP is developed and maintained by Sonu Chaudhary.
Community contributions are welcome and governed by the Contributing guidelines. Long-term direction is driven through GitHub Issues and Discussions. Lab additions, feature proposals, and roadmap input are reviewed publicly.
There is no single-point-of-failure risk: the repository is open source under Apache 2.0 and forkable by the community at any time.

Comments