OpenSSF Best Practices

Local forensic scanner that extracts and verifies credentials from AI tool conversation history. Detection + verification powered by TruffleHog.

Read the original blog post: ghosttype — finding secrets in AI conversation history

Authorized use only. For licensed penetration testers, red teams, and DLP/blue teams operating under explicit written authorization. See THREAT-MODEL.md.


What it does

ghosttype scans AI tool conversation files for exposed credentials, then asks TruffleHog whether each one is actually live by hitting the issuing provider's verification endpoint. Findings are emitted as JSON + CSV, each linked back to the source conversation.

Two complementary detection engines (since v0.4.0):

  • TruffleHog — 800+ structural detectors with live API verification, entropy filtering, known-example exclusion. The only engine that can prove a credential is live.
  • In-tree pattern engine — 30 regex + 10 heuristic patterns. Offline, never verified, but catches loose variable-name context signals (api_key=, password=, JWT_SECRET=) that TruffleHog's structural detectors don't match.

By default both run and results are merged (--engine both); on a (secret_value, file) overlap the TruffleHog finding wins because it carries verification. Choose one with --engine {both,trufflehog,patterns}. ghosttype always owns the discovery layer — where each AI tool stores conversations and how to decode them. Every finding carries a source field so you know which engine produced it.

Supported AI tools:

Tool Data source
Claude Code CLI ~/.claude/projects/**/*.jsonl + history
Cursor IDE state.vscdb (SQLite, global + workspace)
Codex CLI ~/.codex/state_5.sqlite + logs
ChatGPT Desktop Keychain-backed .data files (AES-128-CBC)
Claude Desktop Stub (path detected; extraction in progress)

Detected credential types: the full TruffleHog detector catalog (800+), including AWS, GitHub PATs, OpenAI / Anthropic, Stripe, Slack, HashiCorp Vault, Snowflake, Databricks, Linear, GCP service accounts, Azure, Twilio, Cloudflare, npm, Telegram, Hugging Face, DigitalOcean, Docker Hub, Pulumi, Doppler, PyPI, SendGrid, JWT, PEM private keys, and database connection strings. Each finding is marked verified: true if TruffleHog confirmed it live against the provider's API, or verified: false if the structure matched but verification was skipped, declined, or failed.


Requirements

  • Python 3.11+
  • TruffleHog 3.x installed and on PATH (or set GHOSTTYPE_TRUFFLEHOG_BIN)
  • macOS for full tool coverage (Linux/Windows paths: roadmap)

Check your install:

ghosttype doctor

Quick start

git clone https://github.com/p4gs/ghosttype
cd ghosttype
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Scan all detected AI tools (verifies every credential against its provider)
ghosttype scan

# Triage rotation work: only TruffleHog-verified live credentials
ghosttype scan --only-verified

# Fast offline pass: detect without hitting any provider APIs
ghosttype scan --no-verification

# Pipe to jq, filter by detector
ghosttype scan --no-verification --format json --output - --quiet \
  | jq '.[] | select(.detector_name == "Github")'

# Show which AI tools and TruffleHog are present
ghosttype doctor

Output

Default: ./ghosttype_report/findings.json + findings.csv

Each finding includes:

Field Description
tool Source AI tool (e.g. claude_code)
detector_name TruffleHog detector name (e.g. Github, AWS)
secret_type Detector name, lowercased
severity critical / high / medium — derived from detector + verification state
verified true if TruffleHog confirmed live against the provider, else false
verification_error Verifier error message if verification was attempted and errored
secret_value Plaintext value (use --redact to mask)
file_path Source conversation file
position Chunk position + line within the chunk
confidence verified or unverified
context Window of surrounding text
extra_data TruffleHog detector extras (e.g. rotation_guide URLs)

All scan options

ghosttype scan [OPTIONS]

  --tool TEXT                  Scan one tool: cursor, chatgpt, codex, claude, claude_code
  --format [json|csv|both]     Output format (default: both)
  --output TEXT                Output dir, or - for stdout JSON (default: ./ghosttype_report)
  --engine [both|trufflehog|patterns]
                               Detection engine (default: both). 'patterns'
                               needs no TruffleHog binary.
  --redact                     Mask secret values in output
  --min-confidence             verified | unverified (default: unverified)
                               'verified' = TruffleHog-verified only;
                               'high' (legacy) also keeps regex pattern hits
  --only-verified              Pass --results=verified to TruffleHog
  --no-verification            Skip live verifier calls (fast, offline)
  --trufflehog-binary PATH     Override the TruffleHog binary
  --trufflehog-timeout SECONDS Outer timeout for the TruffleHog subprocess (default: 300)
  --max-age-days N             Only scan files modified within last N days
  --copy-sources               Copy source conversation files to output/sources/
  --allow-list PATH            Suppress known-safe values (one value per line)
  --stats-only                 Print summary statistics only
  --quiet / -q                 Suppress banner for scripting
  --context-window N           Context chars around match (default: 200)

ghosttype list-tools           Show detected AI tools on this machine
ghosttype doctor               Show TruffleHog binary, version, and detected tools
ghosttype version              Print version

Environment variables

  • GHOSTTYPE_TRUFFLEHOG_BIN — explicit path to TruffleHog binary (overridden by --trufflehog-binary)

Exit codes

  • 0 — no findings
  • 1 — at least one finding (enables CI/CD gating)
  • 2 — environment problem (TruffleHog missing, subprocess failed, etc.)

Detection design

ghosttype is two layers stitched together:

[AI tool storage] --(scanner module)--> TextChunks --(trufflehog filesystem)--> Findings
   .jsonl/SQLite/encrypted         (extracted text)        (verified or unverified)

The discovery layer is the per-tool code under ghosttype/scanners/ — one module each for Claude Code, Cursor, Codex, ChatGPT, Claude Desktop. They know SQLite schemas, Electron safeStorage decryption, JSONL message shapes.

The detection + verification layer is a TruffleHog subprocess. ghosttype writes each extracted text chunk to a temp file with a deterministic name, runs trufflehog filesystem --json --no-update [...] <tmpdir>, parses NDJSON results, and maps each one back to the originating conversation record via the temp filename. The temp dir is deleted in a finally block; nothing persists.

See ARCHITECTURE.md for the full pipeline diagram.


Security & threat model

ghosttype is forensic — it reads files you already have access to and runs detectors locally. The only network traffic is TruffleHog's own verification calls to credential issuers (AWS, GitHub, Stripe, etc.) and only when verification is enabled.

Use --no-verification if any of the following apply:

  • You're operating in an air-gapped environment
  • You don't want to risk lighting up provider audit logs on red-team engagements
  • You just want fast triage

See THREAT-MODEL.md for intended-use and abuse considerations.