LLM evaluation

2 posts with this tag

OpenVals: Open-Source Framework for LLM Benchmarking

OpenVals: Open-Source Framework for LLM Benchmarking

OpenVals is an open-source Python framework hosted on GitHub for evaluating and benchmarking large language models from providers like OpenAI, Ollama, Claude, and Gemini. It structures assessments to measure trust, risk, and performance consistently, helping enterprises, regulated industries, and se

Administrator 5/7/2026
future-agi: End-to-End AI Agent Observability for Engineers

future-agi: End-to-End AI Agent Observability for Engineers

Tired of duct-taping AI evaluation tools? future-agi is an open-source, self-hostable platform for end-to-end AI agent observability—featuring reproducible evals, deterministic simulations, and real guardrail enforcement.

Administrator 4/23/2026