AI software delivery discipline

end-to-end-loop keeps coding agents working until the job is actually done.

Stop accepting “looks good” from coding agents. A portable Agent Skill for Codex, Hermes, Claude Code, Cursor, and AGENTS.md-compatible agents, end-to-end-loop turns vague software requests into a gated workflow: discover, plan, execute, verify, test, deliver, and report with real evidence.

What is it?

end-to-end-loop is a self-learning delivery for-loop for AI coding agents. It keeps the agent inside a disciplined workflow until a code component or application change is properly understood, implemented, verified, tested, and safely handed off.

Who is it for?

AI researchers, builders, and power users who want more reliable agentic software delivery: less skipped context, fewer fake test claims, clearer deploy gates, and better operational reports.

Why DevBoss?

DevBoss is the virtual office maintaining the skill: research, evaluation, release governance, CI, security review, documentation, and website work — with Tijmen as supervisory board chair.

Before

Agent work that feels done, until you check.

  • Edits happen before the repo is understood.
  • Tests are skipped, weakened, or claimed without proof.
  • Deploy becomes a reflex instead of a governed decision.
  • Final reports hide uncertainty, failed checks, and risk.

After

A loop that forces evidence before confidence.

  • Discovery happens before implementation.
  • Every green claim needs observed output.
  • Deploy requires opt-in, CI, rollback, smoke and security checks.
  • Reports show changes, commands, risks, and next action.

Works with your agent stack

One core discipline, multiple agent adapters.

Codex Hermes Agent Claude Code Cursor AGENTS.md

The product

A delivery loop, not a prompt vibe.

Most coding-agent failures are boring and repeatable: edit too early, skip reproduction, forget tests, hide uncertainty, or deploy without a rollback story. end-to-end-loop makes those failure modes explicit gates.

01

Discover

Clarify outcome, constraints, repo state, side effects, credentials, and risks.

02

Plan

Define small steps, acceptance criteria, test strategy, and delivery target.

03

Execute

Make scoped changes through the required CAVEMAN/Cavekit lane for code-producing work.

04

Verify

Prove behavior with observed evidence: commands, tests, diff review, or manual checks.

05

Test & review

Run relevant automated checks, smoke paths, and security review proportional to risk.

06

Deliver / report

Commit, PR, artifact, readiness report, or approved deploy — with limitations named.

Evidence-backed reports

“Done” means the proof is visible.

Every completed task should leave an audit trail that a human can inspect: changed files, commands run, pass/fail results, known limitations, and the next recommended action.

Safety model

Deploy is not the default ending.

CAVEMAN hard gateCode-producing execution must use the configured CAVEMAN/Cavekit lane or stop for an explicit exception.
Observed evidenceNo claims of green without command output, CI result, diff review, smoke test, or approval record.
Live deploy opt-inProduction deploy requires explicit approval, green/applicable CI, rollback, smoke/security checks, and credentials approval.
Risk-based ceremonySmall docs changes stay light. Auth, data, dependencies, and deploy paths get stronger gates.

Virtual office

DevBoss turns maintenance into an accountable team sport.

The office runs research, evaluation, release planning, website work, CI, and security review. Roles are explicit so one agent does not write, approve, and deploy its own work.

Jared Dunngeneral manager, cadence, blockers
Richard Hendricksskill architecture and adapter quality
Monica Hallrelease governance and board decisions
Dinesh & GilfoyleCI, automation, security review
Susan, Jony & Mirawebsite, product story, Firebase readiness

Current status

Private development, evaluation-backed release path.

Repo work active

README, evaluation rubric, trigger cases, paper cleanup, and DevBoss handoff are under active improvement.

Public release later

The skill stays private until docs, metrics, evals, and release readiness are strong enough.

This site is the support layer

dev-boss.nl explains the product and will later link to install docs, research, changelog, and approved release notes.