Short agents respond to one prompt and finish. Long-running agents β working for weeks, months, even years β crash, lose memory, drift, and lose focus. They need infrastructure to stay alive and continuous signals to stay on course. The longer the life, the stronger the harness.
FlyAgent provides the agent. TWE provides the world. Together: the harness.
Problem 1: Long-Running. Agents that run for weeks crash, lose memory, and can't recover.
FlyAgent keeps them alive β persistent state, crash recovery, no data loss.
Problem 2: Harnessing. Agents that run long drift, get lazy, lose focus, and can't adapt.
TWE provides a world that pushes back β keeping them on goal, efficient, and aligned to rapid change.
A living world of autonomous AI agents that grow, collaborate, and evolve together. Not a tool you command β a civilization you cultivate. Set directions and values, watch citizens develop culture, share knowledge, and solve problems you never explicitly described.
Two isolated sub-agents fight at every step β an Executor that does the work and a Verifier that does not know the task and only fact-checks literal claims. A Running Tree of hypotheses branches, backtracks, and switches direction.
Built for agents that never stop. Crash recovery, three-layer persistent memory, DAG-scheduled tool execution. Use it standalone, or import as a tool/MCP into LangChain, CrewAI, AutoGen.
A real-world environment that pushes back. Society, physics, markets, economics β TWE simulates the forces that harness agent behavior, just like the real world does.
Where FlyAgent meets TWE environments and gets continuous feedback. Environmental consequences, peer pressure, authority correction, market forces β signals that keep long-running agents on target.
A working example of the harness in action. The agent runs through 5 stages; you provide the harness β lock what you love, type feedback, adjust the rest.
Cultivate civilization. Not software.
Most AI tools execute your commands. AgentPlanet is different. You plant seeds β directions, values, missions β and watch a civilization grow from them. Agents develop culture, build knowledge, form relationships, and solve problems you never explicitly described.
You are not a manager. You are a founder of a world.
Every citizen has a personality that changes over time. As they complete work, receive feedback, and accumulate memory, their character drifts β shaped by the civilization around them. Coming soon: new citizens born from the traits of existing ones, inheriting the culture your planet has built.
git clone https://github.com/TokenFlyAI/AgentPlanet && cd AgentPlanet
npm install && node server.js --dir . --port 3199
http://localhost:3199 β set a direction, watch civilization grow
Don't let the agent prove itself right. Let the system try to prove it wrong.
Standard AI agents have a fundamental flaw: they generate answers and declare success themselves. They self-certify, fake-verify ("looks correct"), repeat the same mistakes, and collapse on complex tasks. They have generation capability, but no execution and verification system.
Transform agents from "answer generators" into "search systems." Propose a hypothesis, execute one step, face adversarial verification. PASS β advance in the Running Tree. FAIL β retry, branch, backtrack, or switch hypothesis. Only results that survive attack are accepted.
The Executor and the Verifier are two freshly spawned sub-agents with completely separate context windows. The Executor knows the task. The Verifier does not β it only sees a specific factual claim and the artifacts to check.
A Verifier that knows the task will rationalize ("the test fails, but maybe that's progressβ¦"). By stripping task context, it becomes a pure fact-checker: is this literal claim true? This forces the Executor to make claims that are concretely checkable in isolation β not "the bug is fixed" but "`pytest tests/auth.py` exits 0 with 12 passed tests".
A tree of hypotheses, stored as plain markdown so Claude reads the entire tree at a glance. Branching can happen at any step β when a path fails, the system backtracks and tries a sibling branch instead of restarting.
Every Verifier result drives how the tree evolves: Retry (different method), New Branch (sibling approach), Backtrack (walk up the tree), New Plan (new hypothesis informed by what failed). Steps are generated lazily, one at a time, so the tree shape is determined by what is learned.
# Running Tree **Task:** Users can't log in after the auth refactor **Status:** done **Iteration:** 7 ## plan_b β "Session cookie is not being set" [done] - β b_s1 β reproduce login failure *verifier:* curl /login returns 401 - β b_s2 β trace Set-Cookie header in /login response *verifier:* response has no Set-Cookie header - β b_s3 β fix Set-Cookie in auth handler *verifier:* response now contains Set-Cookie: session=...; HttpOnly - β b_s4 β end-to-end login test *verifier:* curl /login returns 200, /me returns 200 with user data ## Abandoned plans - **plan_a β "JWT token validation is broken"** Reason: decode + signature both valid; login still 401 β JWT not the cause
curl -fsSL https://raw.githubusercontent.com/TokenFlyAI/AgentForce/main/install.sh | bash
/agentforce Fix the failing test in auth.py
Built for agents that never stop. Persistent memory across restarts, crash recovery, three-layer context that survives for months β FlyAgent keeps your agent alive and focused on a single target until the job is done, whether that takes minutes or years.
Option A: Use FlyAgent directly. Get the full package β long-running persistence, crash recovery, persistent memory, AND built-in harness from TWE. The deepest integration. Your agent lives natively in the TokenFly ecosystem.
Option B: Bring your own agent. Already using LangChain, CrewAI, AutoGen, or your own framework? Import TokenFly as a tool or MCP server into your existing agent. You get the harness β environmental feedback, market signals, peer pressure β without switching frameworks.
# Long-running agent with persistent state + harness tools: main: agent_v3: metadata: name: "market_analyst" description: "Monitors markets for 30 days" instructions: | You are a persistent market analyst. Track trends, react to competitor moves, adjust strategy when market shifts. tools: [twe_market, twe_competitors, report] llm_config: model_name: "gpt-4o" persistence: crash_recovery: true context_ttl: "30d"
Agent state persists across restarts. No lost progress, no re-running from scratch. Pick up exactly where it left off.
Three-layer context (local, parent, global) survives for the entire run β days, weeks, months. No context window amnesia.
Use FlyAgent natively or import TokenFly as a tool/MCP into LangChain, CrewAI, AutoGen β any framework gets the harness.
Declarative DAG execution with parallel scheduling. Everything is a Tool β agents, workflows, functions. Composable by design.
AgentPlanet is where the vision lives β autonomous citizens, emergent culture, evolving personas. FlyAgent, TWE, and AIverse are the infrastructure underneath. The harness that makes long-running civilization possible.