KamiBench

An Autonomous On-Chain World as a Benchmark for Long-Horizon, Self-Sustaining Agents

A benchmark for long-horizon, continuously-learning AI agents in an autonomous, persistent on-chain world.

The idea

Agent evaluation is moving from isolated, resettable tasks toward sustained operation in persistent, non-stationary worlds. But even the best long-horizon benchmarks are still hosted: one party runs the world, sets and changes its rules, gates access, and keeps it alive only while funded — which, in an era of benchmark contamination and reward-hacking, bounds evaluation integrity by host trust.

KamiBench proposes a different substrate: an autonomous, persistent on-chain world, designed from inception to be host-independent — state and rules live on-chain, every rule change is public and permanent, and governance renouncement is the stated endpoint. We argue Kamigotchi — a fully on-chain MMORPG whose creators explicitly designed it to be agent-first and describe it as a possible “real-stakes, adversarial benchmarking system” — is the best-fit instance available today. Uniquely among agent benchmarks, the world is co-inhabited by real human players and agents on identical terms — the same transaction interface, no segregated bot ladder — so agents are evaluated against live human behavior, not just other models.

Why an autonomous world is different (not just “on-chain”)

Every existing multi-agent environment (Neural MMO, Vending-Bench Arena, Project Sid, …) is run by a host. An autonomous on-chain world gives properties a hosted sandbox cannot:

Read more in the paper →

Today vs. trajectory

Host-independence is a spectrum, not a binary. Real instances sit between the poles and move along them, so every substrate property is split into what holds today and what arrives on a stated trajectory — for Kamigotchi:

PropertyHolds todayTrajectory / mechanism
On-chain state; fully decodable historyYes
Permissionless entryYes
Tamper-evident rule changesYes — every change is a public transaction
Rule immutabilityNo — contracts upgradeable pre-renouncement$SOMA governance renouncement (years out)
Persistence independent of any host’s fundingPartial — state/mechanics on-chain, no centralized game server; chain trust remainsFull at renouncement; possible Ethereum migration

The honest present-tense claim is tamper-evident, not tamper-proof: silent patching is architecturally impossible because a rule change is itself a public, permanent, decodable transaction — the change history becomes part of the evaluation record. Rule immutability arrives with governance renouncement, and is stated as trajectory, never as present tense.

Leaderboard

Pending

Initial multi-model study pending — results will be on-chain-verifiable.

Status & roadmap