Whitepaper

Toward a Universal End-to-End Loop Skill for Coding Agents

This paper-in-progress investigates how to turn a phase-based coding-agent workflow into a portable, safe, and testable skill that can operate across multiple agent and coding-tool ecosystems.

Abstract

The initial artifact is end-to-end-loop: a DISCOVER → PLAN → EXECUTE → VERIFY → ITERATE → TEST → DELIVER/DEPLOY → REPORT loop for taking software work from request to verified delivery. The skill exists because coding agents routinely skip discovery, overclaim verification, and treat deployment as a default final step.

Research questions

What common design patterns exist across agent skills, rule files, coding-agent instructions, and repository guidance systems?
Which parts of an end-to-end delivery loop should be universal, and which should be adapter-specific?
How can a skill enforce safety without becoming so ceremonial that agents skip it or users disable it?
What validation artifacts prove that the skill improves outcomes across realistic coding tasks?
How should deployment and external side effects be handled for agents with different permission models?

Initial hypothesis

A universal coding-agent skill should separate the core behavioral contract from tool adapters and evidence artifacts. The core defines phases, gates, safety requirements, and reporting. Adapters translate that core to Codex, Hermes, Claude Code, Cursor, AGENTS.md, and future agent runners. Evidence artifacts capture task plans, verification logs, test results, security review, and deployment records.

Key findings

Progressive disclosure

Agent ecosystems are converging on lightweight metadata plus deeper referenced files. SKILL.md should stay lean; checklists and policy details belong in references.

Trigger text is a safety boundary

The description controls when the skill is surfaced. It needs realistic should-trigger and should-not-trigger tests, especially near misses.

Core-plus-adapters wins

A single Markdown core can be broadly useful, but high-fidelity behavior needs adapters for each agent environment.

Side effects need classification

Safety is not only testing. Network writes, credentials, deploys, destructive operations, and dependency changes need approval gates.

Hermes is a first-class maintenance target

Hermes supports skills, scheduled automations, subagents, messaging gateways, memory, and local execution — enough to maintain the repo with guardrails.

Evaluation must test activation and outcomes

Release readiness requires trigger accuracy, loop compliance, CAVEMAN behavior, deploy safety, and evidence quality.

Design implications

Keep CAVEMAN as a mandatory execution lane for code-producing phases.
Reframe deployment as conditional, high-risk work requiring explicit opt-in and readiness gates.
Make the phase loop adaptive by task risk and size.
Add formal side-effect gates before external writes and irreversible operations.
Use a core-plus-adapters architecture for portability.

Current limitation

The current product baseline has seeded trigger cases, outcome scenarios, validation, and deploy-readiness docs. It is not yet a public v1.0 release: cross-agent trigger/outcome evaluations and install examples still need to be run and recorded.