Anthropic Evals
ai_benchmark
Overview
Developed byAnthropic
Open source✓ Open Source
Use caseevaluating model capabilities and safety properties including persuasion, deception, and autonomy
Also see
Alternative to
Knowledge graph stats
Claims7
Avg confidence97%
Avg freshness99%
Last updatedUpdated yesterday
Trust distribution
100% unverified
Governance
Not assessed
Anthropic Evals
product
Anthropic open-source evaluation suite for measuring model capabilities and safety properties
Compare with...alternative to
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Inspect AI | ○Unverified | High | Fresh | 1 |
used by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Anthropic | ○Unverified | High | Fresh | 1 |
evaluates
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| safety-relevant capabilities and alignment properties of frontier models | ○Unverified | High | Fresh | 1 |
open source
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| true | ○Unverified | High | Fresh | 1 |
primary use case
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| evaluating model capabilities and safety properties including persuasion, deception, and autonomy | ○Unverified | High | Fresh | 1 |
first released
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| 2024 | ○Unverified | High | Fresh | 1 |
developed by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Anthropic | ○Unverified | High | Fresh | 1 |