Compaction & Context Tuning
How to configure OpenClaw’s compaction, memory flush, context pruning, and session search so your agent doesn’t lose its mind (or personality) during long sessions.
The Problem
Long sessions kill context. Here’s the progression:
- Early in session: Agent follows all your AGENTS.md rules, maintains personality, remembers what you said 10 messages ago.
- Mid session: Context window fills up. Compaction kicks in and summarizes older exchanges. Some nuance gets lost.
- Late session: Behavioral rules from AGENTS.md get compacted away. Personality drifts. The agent goes generic. Critical decisions from earlier in the session are gone.
The default compaction settings are conservative. They protect the model from running out of context, but they don’t protect your context from being destroyed. These five config changes fix that.
Important: Use config.patch for all of these. Never config.apply unless you’re intentionally replacing your entire config file.
Pre-flight: Schema Validation
Before patching anything, confirm the fields exist in your OpenClaw version:
config.schema.lookup → agents.defaults.compaction
config.schema.lookup → memory
If a field doesn’t exist in the schema, your OpenClaw version may not support it yet. Check the docs or update.
Step 1: Compaction Tuning
What it does: Controls how aggressively OpenClaw summarizes older conversation history to free up context window space.
Path: agents.defaults.compaction
{
"mode": "safeguard",
"keepRecentTokens": 20000,
"recentTurnsPreserve": 4,
"maxHistoryShare": 0.7,
"reserveTokens": 30000
}
What each field does:
| Field | Value | Why |
|---|---|---|
mode | "safeguard" | Only compacts when context pressure is real, not preemptively |
keepRecentTokens | 20000 | Protects the last 20K tokens from summarization |
recentTurnsPreserve | 4 | Always keeps the last 4 full exchanges intact |
maxHistoryShare | 0.7 | History can use up to 70% of the context window |
reserveTokens | 30000 | Keeps 30K tokens free for the model’s response and tool use |
Tuning notes:
- If you use a model with a smaller context window (e.g., 32K), reduce
keepRecentTokensto 8000-10000 andreserveTokensto 15000. recentTurnsPreserve: 4is a minimum. If your workflow involves multi-step reasoning across exchanges, bump to 6-8.
Step 2: Memory Flush
What it does: Before compaction destroys context, memory flush tells the agent to write important information to disk. Think of it as an auto-save before the game overwrites your checkpoint.
Path: agents.defaults.compaction.memoryFlush
{
"enabled": true,
"softThresholdTokens": 32000,
"forceFlushTranscriptBytes": "2mb",
"prompt": "Write a durable session note to memory/YYYY-MM-DD.md (use today's date). Capture: decisions made, corrections given, constraints or preferences learned, open questions with owners, new projects or resources discovered, any state that would break the plan if forgotten. If nothing meaningful happened, write NO_FLUSH.",
"systemPrompt": "Be terse. Prefer bullet points. Do not rewrite the conversation."
}
What each field does:
| Field | Value | Why |
|---|---|---|
enabled | true | Turns on auto-flush |
softThresholdTokens | 32000 | Triggers flush when context exceeds 32K tokens |
forceFlushTranscriptBytes | "2mb" | Forces flush when raw transcript exceeds 2MB |
prompt | (see above) | Tells the agent exactly what to capture |
systemPrompt | (see above) | Keeps the flush output concise |
Tuning the prompt: The default prompt above captures the most critical session state. Customize it based on what matters to your workflow:
- Working on code? Add “file paths modified, test results, and unresolved errors.”
- Research heavy? Add “sources consulted, key findings, and remaining questions.”
- The
NO_FLUSHescape hatch prevents empty files from cluttering your memory directory.
Tuning the threshold:
softThresholdTokens: 32000works well for 200K context models. For smaller contexts, set this to roughly 40-50% of your model’s window.- The byte-based
forceFlushTranscriptBytescatches cases where many tool outputs bloat the transcript without proportionally increasing token count.
Step 3: Post-Compaction Rule Re-injection
What it does: After compaction summarizes old context, your behavioral rules from AGENTS.md may get compressed or lost. This setting re-injects specific sections from AGENTS.md after every compaction, so your agent’s personality and operating rules survive indefinitely.
Path: agents.defaults.compaction.postCompactionSections
First, check what sections exist in your AGENTS.md:
grep -n '^## \|^### ' AGENTS.md
Then pick the sections that contain critical behavioral rules:
["Every Session", "Memory", "Safety"]
How to choose sections:
- Always include: Startup behavior, safety rules, memory workflow
- Usually include: Communication style, delegation rules, tool preferences
- Skip: Project-specific sections, temporary sprint goals, large reference tables
Why this matters: Without re-injection, a 4-hour session will gradually lose your agent’s personality. It starts following instructions loosely, forgets writing rules (like “no em dashes”), and reverts to generic assistant behavior. Re-injection prevents this drift.
Sizing note: Each re-injected section costs tokens on every turn after compaction. Keep the total under ~2000 tokens. If a section is too large, extract the critical rules into a dedicated smaller section.
Step 4: Context Pruning
What it does: Trims old, bloated tool outputs (web fetches, file reads, large API responses) from your context window. These outputs served their purpose when they were fresh but are dead weight 30 minutes later.
Path: agents.defaults.contextPruning
{
"mode": "cache-ttl",
"ttl": "5m",
"keepLastAssistants": 3,
"softTrimRatio": 0.3,
"hardClearRatio": 0.5,
"minPrunableToolChars": 20000,
"softTrim": {
"maxChars": 3000,
"headChars": 1000,
"tailChars": 1000
},
"hardClear": {
"enabled": true,
"placeholder": "[tool output pruned]"
}
}
What each field does:
| Field | Value | Why |
|---|---|---|
mode | "cache-ttl" | Time-based pruning of stale content |
ttl | "5m" | Aggressive: tool outputs older than 5 min get pruned |
keepLastAssistants | 3 | Always keeps the last 3 assistant messages intact |
minPrunableToolChars | 20000 | Only prunes tool outputs larger than 20K chars |
softTrim.maxChars | 3000 | When soft-trimming, keep ~3K chars (1K head + 1K tail) |
hardClear.enabled | true | After TTL expires, replace content with placeholder |
Why cache-ttl:
A single web_fetch can dump 50K characters into your context. Multiply that by a few research queries and you’ve burned half your window on content you referenced once. Cache-TTL prunes these after the configured time while keeping recent tool outputs available for follow-up questions.
Tuning notes:
- The
ttl: "5m"above is aggressive - it’s what we run because we do lots of web research and file scanning. Use"1h"or"2h"if your workflow is quieter. minPrunableToolChars: 20000skips prunning anything under 20K chars. Smaller outputs cost less to keep around; big ones are the actual problem.keepLastAssistants: 3keeps the recent thread intact regardless of what the pruner does to tool outputs.
Known bug (2026-04, upstream ticket queued): Context pruning can split tool_use/tool_result pairs. When the conversation grows past the pruning threshold, the pruner may remove a tool_result while keeping its tool_use, causing a hard Anthropic API error (tool_use ids were found without tool_result blocks). The only workaround today is starting a fresh conversation. Upstream fix should enforce atomic pair handling.
Step 5: QMD Session Transcript Search
What it does: Enables semantic search over raw conversation transcripts from past sessions. Without this, memory_search only searches your MEMORY.md and knowledge card files. With it, your agent can find specific things said in previous conversations.
Path: memory.qmd.sessions
{
"enabled": true,
"retentionDays": 90
}
Also confirm this is set:
// memory.qmd.includeDefaultMemory should be true
Why 90 days: Session transcripts are large. Retaining them indefinitely bloats your search index and slows queries. 90 days gives you three months of searchable history, which covers most “what did we decide about X last month?” questions. Increase to 180 if you have long-running projects with infrequent check-ins.
Verification
After applying all patches, verify with config.get:
config.get → agents.defaults.compaction
config.get → memory
Confirm:
compaction.modeis"safeguard"compaction.memoryFlush.enabledistruecompaction.postCompactionSectionscontains your chosen section namescontextPruning.modeis"cache-ttl"memory.qmd.sessions.enabledistrue
How These Work Together
Here’s the lifecycle during a long session:
- Early session (< 32K tokens): Everything works normally. Full context available.
- Approaching threshold (32K tokens): Memory flush triggers. Agent writes key decisions and state to
memory/YYYY-MM-DD.md. - Context pressure builds: Context pruning removes stale tool outputs older than 2 hours.
- Compaction triggers: Safeguard mode compresses older exchanges but protects the last 20K tokens and 4 exchanges.
- Post-compaction: Behavioral rules from AGENTS.md are re-injected. Agent retains personality and operating rules.
- Next session: Agent searches QMD transcripts and memory files to recover context from previous sessions.
The result: your agent maintains personality across multi-hour sessions, preserves critical decisions to disk before they’re lost, and can recall past conversations through semantic search.
Common Mistakes
- Setting
keepRecentTokenstoo high on small-context models. If your model only has 32K tokens, keeping 20K recent means compaction has almost no room to work. Scale proportionally. - Forgetting to customize
postCompactionSectionsfor your AGENTS.md structure. The default example["Every Session", "Memory", "Safety"]only works if those exact section names exist in your file. - Setting
softThresholdTokenstoo low. If flush triggers on every other message, you’ll get dozens of near-empty memory files. Set it high enough that flush only fires when there’s real content to preserve. - Not verifying after patching.
config.patchmerges silently. If you typo a field name, it creates a new (ignored) field instead of failing. Always verify withconfig.get. - Changing bootstrap files to shift compaction behavior. Any edit to SOUL.md, AGENTS.md, TOOLS.md, IDENTITY.md, or MEMORY.md invalidates the whole prefix cache. Don’t treat bootstrap edits as free. See prompt caching.
- Not setting
keepLastAssistantslow enough. Older guides recommended10. We run3in production - compaction has more room to work without sacrificing recent conversational coherence.