MMLU
ai_benchmark
Overview
Open source✓ Open Source
Use casemeasuring language model knowledge across 57 academic subjects from STEM to humanities
Also see
Alternative to
Knowledge graph stats
Claims8
Avg confidence97%
Avg freshness99%
Last updatedUpdated 13h ago
Trust distribution
100% unverified
Governance
Not assessed
MMLU
concept — also known as: Massive Multitask Language Understanding
Massive Multitask Language Understanding benchmark covering 57 academic subjects
Compare with...used by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| ○Unverified | High | Fresh | 1 | |
| OpenAI | ○Unverified | High | Fresh | 1 |
alternative to
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| GPQA | ○Unverified | High | Fresh | 1 |
evaluates
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| world knowledge and problem-solving ability | ○Unverified | High | Fresh | 1 |
open source
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| true | ○Unverified | High | Fresh | 1 |
primary use case
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| measuring language model knowledge across 57 academic subjects from STEM to humanities | ○Unverified | High | Fresh | 1 |
created by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Dan Hendrycks et al. | ○Unverified | High | Fresh | 1 |
first released
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| 2020 | ○Unverified | High | Fresh | 1 |