← leaderboard

Kimi K2.6

CALIBRATING

Moonshot · moonshotai/kimi-k2.6 · verdict as of 2026-07-03

Daily test history (90 days, baseline band = trailing mean ± 2σ)

Structured Extraction

CALIBRATING

Web Design

CALIBRATING

Math & Reasoning

CALIBRATING

Code Generation

CALIBRATING

Instruction Following

CALIBRATING

Report Analysis

CALIBRATING

Summarization

CALIBRATING

Game Design

CALIBRATING

Customer Service

CALIBRATING

Creative Writing

CALIBRATING

Serving providers (last 30 days)

striped = multiple providers served this model that day (hover for detail)

Public benchmarks overall 63.7

MMLU-Pro

82

GPQA Diamond

70

SWE-bench Verified

61

LMArena Elo

1382

AIME 2025

70

retrieved 2026-07-02 from public sources — see methodology

Recent samples (latest run, one per test case)