Why Context Engineering Is the New Prompt Engineering for AI Agents

The Growing Challenge of Context Management in AI Agents

Modern AI agents face a critical challenge: context management. When you start a fresh session in Claude Code, the context allocation reveals the scope of this problem. A typical system prompt consumes approximately 2,000 tokens, with system tools taking another 6% of available space. This leaves roughly 70% of the context window available for the agent to perform its work.

However, the situation becomes significantly more constrained when you enable MCP (Model Context Protocol) tools. After activating commonly used tools like Contact 7, Playwright, and Chrome DevTools MCP server, the context allocation shifts dramatically. While slash commands occupy about 900 tokens, MCP tools alone now consume approximately 16% of the context window—and this is before any of these tools have even been used. This percentage represents only the space required for tool descriptions and their capabilities.

The result is that usable context shrinks to around 50%. Even more concerning is that much of this remaining context isn’t effectively usable due to the attention mechanism in transformer architecture. When building agentic systems, extreme care in context management becomes non-negotiable.

From Prompt Engineering to Context Engineering

Context management has become increasingly critical as we move toward more sophisticated agentic systems. Anthropic’s blog post “Effective Context Engineering for AI Agents” offers valuable insights into this evolution, introducing a key distinction between prompt engineering and context engineering.

The Pre-Agentic Era

In the traditional pre-agentic era, prompt engineering was paramount. The system prompt controlled LLM behavior, determining how the model would respond to user messages. This approach worked well for single-turn queries where prompt engineering was the critical tuning mechanism.

The Shift to Multi-Turn Systems

As we build more complex agentic systems with multi-turn interactions involving users, tools, and environments, prompt engineering has evolved into context engineering. Modern systems must manage access to multiple tools, internal knowledge bases, domain knowledge, system instructions, and even memory—all of which feed into the agent’s context.

Simply stuffing all available information into the context leads to context rot. The solution lies in actively managing context to preserve only the information needed at specific points in time. For example, when an agent makes a tool call and receives results, those results enter the overall context. However, irrelevant information should be pruned to maintain context quality.

Calibrating the System Prompt

Even within context engineering frameworks, the system prompt remains critical as it controls the overall behavior of your agent. Anthropic recommends that system prompts use extremely clear, simple, direct language that presents ideas at the “right altitude” for the agent.

Avoiding Common Failure Modes

This “right altitude” represents a Goldilocks zone between two common failure modes:

Over-specification: Engineers sometimes hardcode complex, brittle logic into prompts to elicit exact behavior from the agent. This approach creates fragility and increases maintenance complexity over time.

Under-specification: On the opposite extreme, engineers provide vague high-level guidance that fails to give the model concrete signals for desired outputs or falsely assumes shared context.

Best Practices for System Prompts

Effective system prompts should be divided into distinct sections using XML tags or markdown to create clear delineation between different parts. A useful rule of thumb: if a human engineer cannot understand instructions, the LLM will most probably fail as well.

Teams should avoid providing a laundry list of edge cases encountered during development. Instead of attempting to articulate every possible rule for a particular task, provide sufficient information and rely on the model’s intelligence. When using few-shot examples, choose diverse canonical examples that effectively portray expected behavior without forcing the agent to focus exclusively on edge cases.

Three Core Context Management Techniques

1. Compacting and Summarization

LLMs tend to lose focus as they approach context window limits. Compacting addresses this by summarizing the most important content the agent has encountered, including specific rules to follow and failure points observed.

Claude Code implements this through a “compact” command that clears conversation history while maintaining a summary in context. The system sometimes automatically runs this command as the context window approaches its limit. However, excessive compacting can result in lost information, so careful calibration is necessary.

Model-Specific Considerations

An interesting development is that Claude Sonnet 3.5 is aware of its own context window—a unique feature that shapes its behavior as it approaches context limits. The model proactively summarizes its progress and becomes more decisive about implementing fixes to close out tasks.

This native behavior is so distinct that Cognition had to rebuild their Devon system from scratch specifically for Claude Sonnet 3.5. Their previous implementation was incompatible with this context awareness. The results demonstrated significant improvements: 2x faster performance and 12% better results on their evaluations. This highlights that prompt engineering and context engineering strategies cannot simply be transferred from one model to another.

2. Structured Note-Taking and Agentic Memory

Structured note-taking, also known as agentic memory, involves the agent regularly writing notes that persist to memory outside the context window. The agent can then refer to this memory whenever needed.

Claude Code implements this through files like notes.md. This approach requires designing memory systems as complementary mechanisms that provide useful information alongside the actual context window, with information retrieved as needed rather than occupying precious context space continuously.

3. Sub-Agent Architecture

The sub-agent architecture assigns specific tasks to dedicated sub-agents, each with its own purpose. Each sub-agent can conduct research and take actions using available tools, with a crucial benefit: the sub-agent’s context window remains separate from the main context window.

When sub-agents take actions, they don’t rot or pollute the main context window. Only essential information returns to the main agent: the input to the sub-agent, a likely summarized list of actions taken, and the output. This preserves the main context window while sub-agents handle the bulk of the work.

Maintaining Contextual History

An important distinction exists in how different systems handle tool actions over time. While Anthropic’s Claude Code discards tool actions after a certain time if the agent deems them irrelevant, some practitioners recommend masking them instead of complete disposal.

The masking approach maintains a contextual history of actions taken by the agent. While you don’t need to preserve every detail, providing the agent with historical context about actions it has taken can be valuable for decision-making and consistency.

The Evolving Landscape

Context engineering represents an exciting and rapidly evolving field. As agentic systems become more sophisticated and widespread, the techniques and strategies for managing context windows will continue to develop. The shift from simple prompt engineering to comprehensive context engineering reflects the growing complexity of AI systems and the need for more nuanced approaches to managing their capabilities and limitations.