OpenClaw compaction failure analysis

Executive Summary

Most likely local failure mode

On Codex-OAuth sessions, normal turns can use the Codex runtime while compaction fallback can still resolve to the plain openai provider. If there is no direct OpenAI API key for that path, compaction fails at the moment the context is full.

Not fully singular

Recent issues also show stale thread bindings, early preflight compactions, untracked Codex runtime overhead, event-loop stalls, and stuck locks.

Upstream is moving

origin/main contains several fixes after the stable branch, including direct Codex compaction routing and Codex boundary hardening.

Conclusion: for the current setup, updating blindly is risky because the local branch is not a simple ancestor of origin/main and has dirty changes in the exact compaction runtime-context files. But the dirty local patch is in the right place: it maps the context-engine runtime context from openai to openai-codex when the harness runtime is Codex.

Failure Mechanism

The concrete failure chain reported in the newest issues looks like this:

Session grows

Long tool-heavy or chat-heavy session reaches preflight or provider context pressure.

Native compact attempted

Codex app-server compaction needs an existing thread/session binding.

Binding missing

If the binding is missing or stale, older paths fall back to the context engine.

Provider mismatch

The fallback can resolve openai/gpt-5.5 instead of openai-codex/gpt-5.5.

Compaction fails

The direct OpenAI path asks for a normal API key and fails, or the session stalls around locks/timeouts.

Newest Relevant Issues

Issue	Signal	Why it matters here	Status read
#86820	primary Codex OAuth fallback tries direct OpenAI	Matches the observed symptom: compaction reaches fallback, then fails because the plain OpenAI provider has no API key.	Open, updated 2026-05-26.
#86373	routing embedded compaction fallback target mismatch	Describes the provider/auth split directly: `provider=openai` with `authProfileId=openai-codex:...`.	Recent; connected to the same fix train.
#86470	doctor rewrites Codex profiles	Explains how a valid `openai-codex/` route can be normalized into an apparently valid but compaction-breaking `openai/` setup.	Still relevant on stable v2026.5.22.
#86819	accounting untracked runtime overhead	`/context detail` can account for only a small fraction of reported context, leaving about 62k tokens as provider/runtime overhead.	Open; exact live Codex baseline still needs proof.
#86358	runtime event-loop starvation	Compaction can stall the Node event loop long enough that unrelated fetches time out, making a recovering session look broken.	Open, P1-class behavior.
#85712	preflight tiny context after compact-only route	Preflight can decide to compact only, then continue with a tiny assembled context and no user-facing warning.	Open; likely adjacent to repeated compaction reports.
#81178	stale state repeated early preflight compactions	After a successful compact, stale pre-compaction usage can trigger another premature compact.	Recent comments, regression-shaped.
#70334	stuck session processing lock remains	Older but still explains the user-visible “it went quiet” pattern after context overflow handling.	Historical but aligned.

Last Four Days of Relevant Code Changes

2026-05-22

`c08400ea7d` Fix context pressure preflight for tool-heavy sessions

Introduces stronger preflight pressure estimation and routes such as compact_only, truncate_tool_results_only, and compact_then_truncate. This is useful, but it means compaction can now be triggered before the model call by OpenClaw's own estimator.

2026-05-23

`46de078b2a` Bound embedded compaction write locks

Targets a known failure class where compaction/session locks can remain held too long and block later session progress.

2026-05-24

`dd47e479ae` Fail Codex compaction at the Codex boundary

Important hardening: when the harness runtime is Codex, missing or stale native compaction binding should not silently fall through into the wrong context-engine path.

2026-05-24

`f0061ddc54` Preserve partial summary on mid-chain chunk failure

Improves recovery when chunked compaction fails partway through, reducing all-or-nothing loss.

2026-05-25

`f4cfa012e1` Route compaction through Codex auth provider

The direct embedded compaction path now maps OpenAI + Codex runtime/auth to openai-codex before resolving model auth. This directly addresses the missing direct OpenAI API key failure for that path.

2026-05-26

`bcde7b138a` Handle preflight compaction no-op budgets

Targets repeated/no-op compaction behavior after the preflight estimator believes compaction is needed but the effective budget situation has not improved.

Local Branch Read

Branch state

Local source is on stable-v2026.5.22-guest, not a clean fast-forward ancestor of origin/main. A blind merge would mix guest branch changes with the current upstream fix train.

Dirty files are relevant

Two dirty files are exactly in the compaction runtime-context area: compaction-runtime-context.ts and its test.

What the dirty patch appears to fix

It resolves the harness policy, maps openai + Codex runtime to openai-codex for context-engine runtime context, and preserves openai-codex:... auth profiles only when the provider is deliberately changed to the Codex provider. That covers a gap still visible in origin/main where buildEmbeddedCompactionRuntimeContext returns provider: resolved.provider.

Recommended Next Actions

1. Test the dirty runtime-context patch

Add a minimal Codex-OAuth compaction fallback repro and run the targeted test file. This patch is likely not cosmetic; it covers the remaining context-engine runtime-context mismatch.

2. Do not run doctor fixes blindly

Until #86470 is resolved, avoid automatic rewrites that turn openai-codex/* into openai/* without proving compaction still routes through Codex auth.

3. Fail visibly at the right boundary

If Codex native compaction has no valid thread binding, report that specific condition instead of falling into a misleading direct OpenAI API-key failure.

4. Separate runtime overhead in reports

/context detail should label Codex native/cache/runtime overhead separately so users do not chase AGENTS.md, tools, or memory ghosts for a 62k-token residual.

5. Guard preflight no-op loops

After compaction, next preflight should use post-compaction active replay, not stale transcript records. No-op compactions need an escalation path.

6. Watch event-loop stalls

Large compaction should not starve Telegram/API fetches. Keep timer-delay diagnostics and consider offloading CPU-heavy summary assembly or token accounting.

Evidence Used

GitHub issues queried with gh issue list and gh issue view: #86820, #86819, #86373, #86470, #86358, #85712, #81178, #70334.
Source read from origin/main: src/agents/pi-embedded-runner/compact.queued.ts:52-65, 140-167, 388-435.
Source read from origin/main: src/agents/pi-embedded-runner/compaction-runtime-context.ts:105-143.
Source read from origin/main: src/agents/pi-embedded-runner/compact.ts:494-524, 558-615.
Source read from origin/main: src/agents/openai-codex-routing.ts:217-258.
Local diff read from /home/hakalya/openclaw: src/agents/pi-embedded-runner/compaction-runtime-context.ts and compaction-runtime-context.test.ts.

Executive Summary

Failure Mechanism

Session grows

Native compact attempted

Binding missing

Provider mismatch

Compaction fails

Newest Relevant Issues

Last Four Days of Relevant Code Changes

c08400ea7d Fix context pressure preflight for tool-heavy sessions

46de078b2a Bound embedded compaction write locks

dd47e479ae Fail Codex compaction at the Codex boundary

f0061ddc54 Preserve partial summary on mid-chain chunk failure

f4cfa012e1 Route compaction through Codex auth provider

bcde7b138a Handle preflight compaction no-op budgets

Local Branch Read

Branch state

Dirty files are relevant

What the dirty patch appears to fix

Recommended Next Actions

1. Test the dirty runtime-context patch

2. Do not run doctor fixes blindly

3. Fail visibly at the right boundary

4. Separate runtime overhead in reports

5. Guard preflight no-op loops

6. Watch event-loop stalls

Evidence Used

`c08400ea7d` Fix context pressure preflight for tool-heavy sessions

`46de078b2a` Bound embedded compaction write locks

`dd47e479ae` Fail Codex compaction at the Codex boundary

`f0061ddc54` Preserve partial summary on mid-chain chunk failure

`f4cfa012e1` Route compaction through Codex auth provider

`bcde7b138a` Handle preflight compaction no-op budgets