A2A idempotency policy
Covenant A2A is a durable, explicitly leased queue. It does not automatically redeliver leased work after restart. Operators can repair stale leases explicitly, and a disabled-by-default retry scan can requeue only stale tasks that declare idempotent duplicate safety and carry a non-empty key.
Terms
- Attempt. One lease and execution of a task.
- Duplicate execution. A task is executed more than once (for example, the receiver crashes after performing work but before posting a result).
- Retry. Requeueing a task for another attempt without changing the task id.
Policy goals
- Make duplicate-work risk explicit and machine-checkable.
- Prevent silent duplicate side effects when automation requeues.
- Keep retries visible via attempt counters and audit rows.
Task metadata
Tasks may carry explicit idempotency metadata in the A2ATask envelope. The daemon validates that a present key is non-empty, persists the metadata, and returns it through queue/status surfaces.
Idempotency class
idempotent: executing the task multiple times with the same task id is safe. Any side effects must be keyed or conditional such that duplicates do not create new external effects.unsafe: duplicate execution may cause external effects. The system must not automatically requeue these tasks.
Manual repair still uses operator_accepted as an explicit human posture. The automated retry gate only accepts task metadata marked idempotent.
Idempotency key
The idempotency key is a stable, caller-chosen key for the logical work unit. For tasks that call external systems that support explicit keys, senders should provide the same key so receivers can forward it consistently.
Receiver-side result cache
When an idempotent task posts a result, the mailbox stores a cached payload keyed by sender, recipient, current task kind, and idempotency key. A later task with the same cache key receives a replayed result immediately instead of being leased again.
JSONL-backed mailboxes persist cache entries in the event log. Task compaction removes resolved task history but keeps cache entries, so future duplicates can still short-circuit after restart.
Explicit retry gate
covenant a2a retry-stale is disabled by default and reports what it would do unless the operator passes --enable.
- Never synthesize a new task id. Retries requeue the same task id and increment the attempt counter on the next lease.
- Retry only tasks marked
idempotent. - Skip tasks without a non-empty idempotency key.
- Make retry decisions observable via
auto_requeueaudit rows and skipped-task report entries. - Bound retry behavior with explicit maximum attempts, maximum requeues, minimum lease age, and scan limits.
Periodic scheduler
The daemon can run the same retry gate on a timer through an explicit environment opt-in. It does not bypass the a2a.repair.requeue capability gate.
COVENANT_A2A_AUTO_RETRY_SCHEDULER=1
COVENANT_A2A_AUTO_RETRY_INTERVAL_MS=60000
COVENANT_A2A_AUTO_RETRY_MIN_LEASE_AGE_MS=300000
COVENANT_A2A_AUTO_RETRY_MAX_ATTEMPTS=3
COVENANT_A2A_AUTO_RETRY_MAX_REQUEUES=1
COVENANT_A2A_AUTO_RETRY_SCAN_LIMIT=100Every scheduler pass records an a2a_auto_retry_scheduler_scan audit summary. Actual mutations still produce per-task auto_requeue repair rows.
Receiver obligations
Receivers may claim idempotent only when:
- persistent writes are conditional on the task id (or explicit idempotency key) so replays do not create new records;
- external calls that support idempotency keys receive the key consistently across retries;
- results are safe to post multiple times (posting the same result twice must not corrupt mailbox state).
If any step cannot be made idempotent, classify the task as operator_accepted.
Relationship to manual repair
Manual lease repair already requires an explicit duplicate-risk posture (idempotent vs operator_accepted). The retry gate is effectively a daemon-initiated requeue, so it must use task metadata and must never bypass this classification.
Follow-up work
- Add an explicit typed task-kind field for cache scoping.
- Add periodic retry scheduling that reuses the existing retry gate.
Related
- Agent-to-agent — A2A envelopes and mailbox surface.
- Live coverage — boundary test inventory.