Compliance-checker-algo provides a standard-agnostic engine for verifying compliance between safety standards and work products. Hosted on GitHub at jherrodthomas/compliance-checker-algo, the Python project has 203 stars. It processes any safety standard in JSON or Markdown format alongside work products like DOCX, PDF, TXT, JSON, or MD files. An eight-layer NLP pipeline—built with TF-IDF for similarity, graph analysis for references, ensemble classification for gaps, and fuzzy matching for semantics—maps requirements to artifacts without hardcoded rules or domain-specific tweaks.
The tool generates detailed reports highlighting compliant sections, gaps, and risks. Users supply a standard such as ISO 26262, IEC 61508, DO-178C, or ASPICE, and pair it with project deliverables. The engine auto-discovers schemas, performs node coverage checks, and scores traceability. Real standards require purchase due to copyright; the repository includes only synthetic examples. An Electron desktop app offers a UI frontend via main.js and renderer files.
What it does
The engine normalizes inputs through artifact_ingester.py, parses Markdown standards via markdown_parser.py, and runs the core logic in agnostic_engine.py. It outputs compliance status across eight analysis layers, producing reports in JSON, DOCX, PDF, HTML, or all formats simultaneously.
Reports detail matches, misses, and risk-weighted gaps. For instance, it flags missing hazard analysis nodes or broken cross-references. An optional compliance_meta.json file provides schema hints like clause tags or ASIL levels to speed up processing, though the engine functions without it.
The algorithm pipeline
The pipeline stacks these layers for thorough checking:
| Layer | What it does |
|---|---|
| 1. Node Coverage | Decision tree on requirement node existence |
| 2. Content Alignment | TF-IDF + cosine similarity |
| 3. Semantic Depth | Ratcliff/Obershelp fuzzy matching |
| 4. Concept Coverage | Set intersection on discovered concept tags |
| 5. Reference Integrity | Graph BFS on cross-reference links |
| 6. Method/Practice Audit | Risk-level-aware method matching |
| 7. Traceability Chain | Directed graph walk on dependency paths |
| 8. Gap Analysis + Risk | Ensemble classifier with risk-weighted scoring |
Layer 2 aligns text via vector similarity. Layer 3 handles partial matches in phrasing. Later layers build graphs for dependencies, essential in standards with cross-references like "see 5.1."
Getting it running
No external dependencies beyond Python's standard library mean quick setup. Clone the repo:
git clone https://github.com/jherrodthomas/compliance-checker-algo.git
cd compliance-checker-algo
Test with examples:
python agnostic_engine.py examples/synthetic_standard examples/synthetic_artifact.json \
--meta examples/synthetic_meta.json
For custom files:
python agnostic_engine.py /path/to/your/standard /path/to/your/artifact.docx --format all
Use --convert for PDF-to-Markdown standards. Outputs include a styled DOCX with sections, interactive HTML dashboard, JSON for parsing, or PDF visuals. The Electron app launches via npm start after npm install in the root, though the CLI drives most use.
Input and output formats
Standards follow a JSON schema with fields like "part," "title," "clauses" array containing "section," "title," "type," "text," "tags," "asil_level," "notes," and "cross_references." Markdown converts automatically. Work products ingest directly; JSON artifacts match the synthetic_artifact.json structure.
Choose outputs with flags:
--format json → Structured JSON report
--format docx → Word document with styled sections
--format pdf → PDF with visual compliance dashboard
--format html → Interactive HTML report
--format all → All of the above
Project structure
Key files include:
├── agnostic_engine.py # Core 8-layer compliance engine
├── artifact_ingester.py # Normalizes DOCX/PDF/MD/TXT/JSON artifacts
├── markdown_parser.py # Converts Markdown standards to engine JSON
├── report_generator.py # JSON report output
├── report_generator_docx.py # Word document report output
├── report_generator_html.py # Interactive HTML report output
├── ISO26262_Checker.py # Legacy ISO 26262-specific checker
├── compliance_meta.json # Example meta-config (schema hints)
├── examples/ # Synthetic test data (no proprietary content)
│ ├── synthetic_standard/ # Fictional 4-part safety standard
│ ├── synthetic_artifact.json # Fictional work product
│ └── synthetic_meta.json # Meta-config for the synthetic standard
├── main.js # Electron desktop app entry
├── renderer/ # Desktop app UI
└── STANDARDS_NOTICE.md # Licensing info for real standards
A legacy ISO26262_Checker.py exists for that specific standard.
Who this is for
Compliance teams in safety-critical fields benefit most. Automotive engineers auditing ISO 26262 hazard analyses, aerospace developers tracing DO-178C requirements, or industrial control specialists checking IEC 61508 find it useful. It suits teams cross-referencing large documents manually, as the pipeline automates schema discovery and gap reports. Smaller projects with simple checklists may skip it, but anyone handling copyrighted standards needs official copies first—check STANDARDS_NOTICE.md.
How it compares
Unlike domain-locked tools, this engine adapts to any standard via auto-discovery, avoiding rule rewrites. The legacy ISO26262_Checker.py in the repo shows a narrower predecessor. Commercial alternatives often tie to one standard like ASPICE, requiring licenses or custom scripts. Open-source options for general document similarity exist (e.g., TF-IDF libs), but few chain NLP with graph traceability for compliance. It's heavier on compute for full pipelines than basic checkers, yet standard-library only keeps it lightweight—no pip installs.
Users obtain real standards separately, pair them with artifacts, and generate reports via CLI or Electron app. Source at https://github.com/jherrodthomas/compliance-checker-algo. Casual document matching skips this; it's built for regulated compliance workflows.
Comments