SWE-bench
ai_benchmark
Overview
Open source✓ Open Source
Use caseevaluating LLMs on real-world software engineering bug-fixing tasks from GitHub
Also see
Alternative to
Knowledge graph stats
Claims7
Avg confidence97%
Avg freshness99%
Last updatedUpdated yesterday
Trust distribution
100% unverified
Governance
Not assessed
SWE-bench
concept — also known as: SWE-bench Verified, SWEbench
Benchmark for evaluating LLMs on real-world software engineering tasks from GitHub issues
Compare with...alternative to
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Aider Polyglot | ○Unverified | High | Fresh | 1 |
used by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Anthropic | ○Unverified | High | Fresh | 1 |
evaluates
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| code generation and software engineering ability | ○Unverified | High | Fresh | 1 |
open source
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| true | ○Unverified | High | Fresh | 1 |
primary use case
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| evaluating LLMs on real-world software engineering bug-fixing tasks from GitHub | ○Unverified | High | Fresh | 1 |
first released
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| 2023 | ○Unverified | High | Fresh | 1 |
created by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Princeton NLP Group | ○Unverified | High | Fresh | 1 |