AgentForce runs two freshly spawned sub-agents in completely separate context windows — an Executor that does the work, and a Verifier that does not know the task and only fact-checks literal claims. A step passes only when the Verifier cannot break it. A Running Tree of hypotheses branches, backtracks, and switches direction based on what the Verifier finds.
AgentForce is a single Claude Code skill file. Re-run the script any time to update.
$ curl -fsSL https://raw.githubusercontent.com/TokenFlyAI/AgentForce/main/install.sh | bash
They generate answers and declare success themselves — no external check, no real verification, no course correction when they go wrong.
| Problem | Symptom |
|---|---|
| Self-certification | Agent announces "done" — no external check |
| Fake verification | "Looks correct", "should work" — no evidence |
| No course correction | Wrong direction → keeps going anyway |
| Repeated failures | Same mistake made again |
| Single-shot | Complex tasks cause the agent to collapse |
Root cause: agents have generation capability, but no execution and verification system.
Propose a hypothesis, execute one step, face adversarial verification. The Verifier's result drives how the Running Tree evolves.
Steps are generated lazily — one at a time after the previous step is verified — so the tree shape is determined by what is learned, not pre-committed. Wrong directions get pruned. Failed hypotheses become negative examples that steer future planning away from dead ends.
The Orchestrator manages the tree; two isolated sub-agents fight it out at every step.
The Executor and Verifier are two freshly spawned sub-agents with completely separate context windows. The Executor knows the task. The Verifier does not — it only sees a specific factual claim and the artifacts to check.
| Executor | Verifier | |
|---|---|---|
| Knows the task? | ✓ Yes | ✗ No |
| Knows the hypothesis? | ✓ Yes | ✗ No |
| Knows prior history? | Recent only | ✗ No |
| Job | Execute & claim | Falsify the claim |
A Verifier that knows the task will rationalize. Stripping task context turns it into a pure fact-checker: is this literal claim true, yes or no? This forces the Executor to make claims that are concretely checkable in isolation — not "the bug is fixed" but "`pytest tests/auth.py` exits 0 with 12 passed".
A tree of hypotheses, stored as plain markdown so Claude reads the entire tree at a glance. Branching can happen at any step — when a path fails, the system backtracks and tries a sibling branch instead of restarting.
Why markdown? JSON encodes the tree as parent/children pointers — Claude has to mentally walk references. Markdown shows the tree visually, with status icons inline, claims and verifier evidence right next to each step. The Orchestrator (Claude itself, using Claude Code's built-in planning) reads and rewrites this file every iteration.
Every 10 iterations, AgentForce spawns a Thinker. Unlike the Executor (does work) and the Verifier (attacks claims), the Thinker steps back and reviews the entire run with full context — task, goals, complete tree, all verifier evidence — and offers strategic perspective. Advisory, not authoritative. The Orchestrator may incorporate, defer, or dismiss — but should pay attention.
The Thinker reads the full execution history (including prior thinker notes, so it doesn't repeat itself) and produces a structured review:
The Thinker is constrained to align with stated goals — it doesn't go off-task. Its output is appended to .agentforce/thinker-notes.md and read by the Orchestrator at the next THINK phase.
The Orchestrator decides what to do with this — typically incorporates a strong suggestion as a new branch, or notes a roadmap item for after the run.
Three architectural choices keep AgentForce honest across iterations: a re-read protocol, persistent process handling, and a clean split between Claude Code's native task UI for live planning and the running tree for durable history.
Late in a long run, context bloat and convergence pressure can make the Orchestrator
shortcut the Verifier. The fix: hard rules live in .agentforce/protocol.md,
a small file the Orchestrator re-reads at the start of every iteration. Even if
the conversation context is compacted, a fresh file read restores the rules verbatim.
Plus a conditional final-gate before Status: done:
when ≥2 steps were verified or branches were tried, one last adversarial Verifier attacks the
overall outcome. Single-step tasks skip it — the per-step Verifier IS the final.
And a THINK phase before each Executor spawn: the Orchestrator deliberately
reflects on whether to continue or pivot. Built-in tools like TaskList()
are available but not required — most cycles just continue, occasional cycles reshape.
Agent() calls per step (Executor + Verifier)*verifier:* evidence lineSome steps need to start things that must outlive their iteration — a game server you'll iterate on, a watcher, a daemon. Plain background processes get killed when the Executor sub-agent ends. AgentForce's Executor uses two patterns and decides per-command:
| Pattern | When |
|---|---|
| One-shot (default) | Tests, builds, file ops, curls — runs then exits |
| Persistent | Servers, daemons, watchers — must outlive the iteration |
The persistent pattern uses setsid nohup ... &; disown plus a
manifest at .agentforce/processes/<name>.json.
The process survives the Executor, the Orchestrator, and the entire /agentforce run.
A future invocation reads the manifests, verifies the PIDs are still alive, and lets the next
Executor pick up exactly where the last one left off.
The Orchestrator uses Claude Code's built-in TaskCreate /
TaskUpdate for the current step — visible to the user as a live spinner
in the native task UI. running-tree.md is the durable verified history,
append-only, with branches and abandoned plans.
| TaskList | running-tree.md | |
|---|---|---|
| Role | Live planning | Durable record |
| Lifecycle | Per-session | Persistent on disk |
| Granularity | Current step | Full tree + history |
Exactly one task is in_progress at any time — that IS the current step.
On cross-session resume, the Orchestrator parses running-tree.md and
re-creates the task. No parallel custom UI; one source of truth for "what's happening now."
Task: "Users can't log in after the auth refactor" — final state of
.agentforce/running-tree.md
Plan A's evidence ("decode and signature both valid; login still 401") directly informed Plan B's hypothesis. The tree searched where it needed to, stopped when it found a verified path, and never touched Plan C.
AgentForce ships as a single Claude Code skill file. Pick whichever install path you prefer.
$ curl -fsSL https://raw.githubusercontent.com/TokenFlyAI/AgentForce/main/install.sh | bash
$ mkdir -p ~/.claude/commands && \
curl -fsSL https://raw.githubusercontent.com/TokenFlyAI/AgentForce/main/.claude/commands/agentforce.md \
-o ~/.claude/commands/agentforce.md.agentforce/running-tree.md in your working directory — fully resumable.
AgentForce is open source and ships as a single Claude Code skill file. Worker and Verifier work in genuine context isolation. Plans branch, backtrack, and switch direction based on real evidence — not the agent's self-report.