Researched by ollama/glm-5.1:cloud Β· April 14, 2026
Price: $5/$25 per 1M tokens
Context: 1M tokens
Type: Proprietary
Price: $3/$15 per 1M tokens
Context: 1M tokens
Type: Proprietary
Price: $2.50/$10.00 per 1M tokens
Context: 1M tokens
Type: Proprietary
Price: $1.00/$3.00 per 1M tokens
Context: 1M tokens
Type: Proprietary
Price: $0.95/$3.15 per 1M tokens
Context: 200K tokens
Type: Open-source, 744B params
Price: $0.38/$1.72 per 1M tokens
Context: 262K tokens
Type: Open-source, 1T params
Price: $0.30/$1.20 per 1M tokens
Context: 196K tokens
Type: Open-source, 230B+ params
Price: $0.30/$1.20 per 1M tokens
Context: 196K tokens
Type: Open-source, 230B+ params
Price: $0.39/$2.34 per 1M tokens
Context: 262K tokens
Type: Open-source, MoE 397B total/17B active
Price: $0.13/$0.38 per 1M tokens
Context: 262K tokens
Type: Open-source, 31B dense
Price: $0.50/$2.00 per 1M tokens
Context: 2M tokens
Type: Proprietary
Price: $0.20/$0.80 per 1M tokens
Context: 256K tokens
Type: Open-source, 236B MoE
Third-party evaluation by Vals.ai β April 2026. The gold standard for measuring AI performance on real-world software engineering tasks.
| Model | Accuracy | Cost/test | Latency |
|---|---|---|---|
| Gemini 3.1 Pro | 78.8% | $0.78 | 312s |
| GPT-5.4 | 78.2% | $0.80 | 307s |
| Opus 4.6 (Thinking) | 78.2% | $1.22 | 351s |
| GPT-5.3 Codex | 78.0% | $0.46 | 247s |
| Sonnet 4.6 | 77.4% | $1.30 | 512s |
| GLM-5.1 (Thinking) | 76.4% | $0.46 | 527s |
| MiniMax M2.5 Lightning | 74.2% | $0.46 | 403s |
| MiniMax M2.7 | 73.8% | $0.47 | 886s |
Note: Kimi K2.5, Qwen 3.5 397B, and Gemma 4 31B have no third-party SWE-bench Verified runs.
Manufacturer-reported scores for transparency. Third-party scores take precedence when available.
| Model | Score | Notes |
|---|---|---|
| Opus 4.6 | 80.8% | Anthropic self-report |
| Gemini 3.1 Pro | 80.6% | Google self-report |
| MiniMax M2.5 | 80.2% | MiniMax self-report |
| Sonnet 4.6 | 79.6% | Anthropic self-report |
| Kimi K2.5 | 76.8% | Moonshot self-report |
| MiniMax M2.7 | ~78% | MiniMax claim; Vals.ai measured 73.8% |
| Qwen 3.5 397B | N/A | No SWE-bench Verified published |
| Gemma 4 31B | N/A | No SWE-bench Verified published |
Multi-language agentic coding benchmark. Measures comprehensive software engineering capabilities across different programming languages and complex scenarios.
| Model | Score | Notes |
|---|---|---|
| GLM-5.1 | 58.4% | New #1 (April 2026) |
| GPT-5.4 | 57.7% | Previous #1 |
| Opus 4.6 | ~57.3% | |
| MiniMax M2.7 | 56.22% | Matches GPT-5.3 Codex |
| Gemini 3.1 Pro | 54.2% | |
| MiniMax M2.5 | 51.3% | Multi-SWE-Bench |
Real-time competitive programming benchmark. Measures coding ability under time pressure with novel problems.
| Model | Score | Notes |
|---|---|---|
| Gemini 3.1 Pro | 2887 Elo | #1 on LCB Pro |
| Kimi K2.5 | 85% | |
| DeepSeek V3.2 | 83.3% | Reference only |
| Gemma 4 31B | 80% | Massive jump from Gemma 3 27B's 29.1% |
Measures AI assistants' ability to handle command-line operations, shell scripting, and DevOps tasks.
| Model | Score |
|---|---|
| GPT-5.4 | 75.1% |
| Gemini 3.1 Pro | 68.5% |
| Opus 4.6 | 65.4% |
| MiniMax M2.7 | 57.0% |
Actual costs per 1M tokens. The dramatic price differences highlight the value proposition of newer open models.
| Model | Input/1M | Output/1M | Context | Type |
|---|---|---|---|---|
| Gemma 4 31B | $0.13 | $0.38 | 262K | Open |
| Gemini 3.1 Pro | $0.50 | $2.00 | 2M | Proprietary |
| GLM-5.1 | $0.95 | $3.15 | 200K | Open |
| MiniMax M2.7 | $0.30 | $1.20 | 196K | Open |
| Qwen 3.5 397B | $0.39 | $2.34 | 262K | Open |
| Kimi K2.5 | $0.38 | $1.72 | 262K | Open |
| GLM-5.1 | $0.95 | $3.15 | 200K | Open |
| Sonnet 4.6 | $3.00 | $15.00 | 1M | Proprietary |
| Opus 4.6 | $5.00 | $25.00 | 1M | Proprietary |