Mastering the Context Window: A Practical Guide for Software Engineers

Have you ever been mid-sprint, building out a sleek new feature, let's say, a 7-day forecast screen for a Weather App, and suddenly your AI coding agent starts losing the plot? It forgets the WeatherDTO you defined ten minutes ago, or worse, it starts hallucinating methods that don't exist in your WeatherRepository.

We've all been there. What you're experiencing isn't a "broken" AI; it's a saturated context window.

In today's post, we're going to look at how to manage that "working memory" so your AI stays sharp from the first git init to the final PR.

1. What Exactly Is a Context Window?

Think of the context window as the Working RAM for your AI agent. Everything the model "knows" during your session lives here: your prompts, the code you've pasted, the AI's previous suggestions, and even the hidden system instructions baked in by the tool provider.

Unlike real RAM, you can't go to Best Buy and upgrade it. Every model has a hard ceiling, and when you're deep in a feature, you'll be surprised how fast you hit it.

Tokens: The Currency of Context

AI doesn't read words; it reads tokens. A token is roughly a syllable or a small chunk of text, and every single one counts against your budget. In our Weather App, this translates roughly like this:

Low Density: A clean Kotlin data class for CurrentWeather. Short, expressive, minimal tokens.
High Density: A 500-line WeatherViewModel with complex reactive streams. Every line costs.
The "Context Killer": Pasting an entire Gradle build log or a massive JSON response from a weather API. This is the equivalent of leaving 30 browser tabs open.

The practical takeaway: code is expensive, data payloads are even more expensive. Treat your context budget like an engineer treats heap allocations: be intentional about what you put in it.

What Counts Against Your Budget?

This trips up a lot of developers. It's not just your messages that consume tokens. The full context includes:

Your messages: every prompt you've sent in the session.
The AI's responses: every reply it's generated, including the verbose ones where it helpfully explained three alternatives you didn't ask for.
The system prompt: hidden instructions the tool provider injects. In some tools, this can run thousands of tokens before you type a single character.
Attached files and code snippets: anything you paste directly into the chat.

By the time you're an hour into a feature, you may have already consumed a significant chunk of your budget before the AI writes a single line of code.

2. When the Ship Starts Sinking: Detecting Saturation

As we add more features, the context window fills up. When it hits the limit, the model doesn't just stop, it starts "forgetting" the oldest information to make room for the new. The degradation is gradual, and that's what makes it dangerous. You won't get an error; you'll get subtly wrong code.

Here's the "Logic Rot" progression to watch for in your session:

Stage 1, Subtle Forgetting. The AI forgets an architectural decision you established early on. For example, you both agreed to use StateFlow not LiveData, but now it's started suggesting LiveData again. Annoying, but easy to catch if you're paying attention.
Stage 2, Instruction Drift. The agent begins to lose grip on your constraints. It might forget you're targeting a specific Compose version, or that you're following a particular naming convention. The code still compiles, but it's drifting from your project's standards.
Stage 3, Active Confusion. This is where it gets dangerous. The AI starts contradicting its own earlier suggestions or, worst of all, hallucinating method names. It might tell you to call weatherRepo.fetchDailyData(), a method it made up and that doesn't exist anywhere in your codebase.
Stage 4, Total Amnesia. The tool silently drops the oldest messages to stay within its hard limit. At this point, entire architectural decisions have been evicted from working memory. The AI is effectively a new hire who missed the first two weeks of onboarding.

The golden rule: When you notice Stage 2 or 3 symptoms, don't try to "remind" the AI inline. That just burns more tokens and delays the inevitable. Instead, treat it like a memory leak: acknowledge it, and plan your reset.

3. The Strategy: Managing Context with Subagents

In Android development, we love Clean Architecture because it separates concerns. A ViewModel doesn't talk directly to a database; a Repository doesn't know about UI state. We should apply that same logic to our AI sessions.

Instead of asking one AI session to build the entire Weather App from scratch to finish, we treat each session as a specialized Subagent. Each agent has a clearly scoped role, a defined set of inputs it needs, and a well-defined output it produces.

By keeping these sessions separate, you ensure the UI Specialist isn't wasting precious tokens on your Retrofit interceptor logic. More importantly, you start each session with a clean slate and a clear mission.

The Specialized Roles

1. The Data Architect

Scope: WeatherDTO, Room entities, Retrofit interfaces, database schema.
Inputs: API response samples, data model requirements.
Outputs: Data class files, @Dao interfaces, migration scripts.
What it doesn't touch: ViewModel logic, UI code, navigation.

2. The Logic Lead

Scope: ViewModel, UseCases, state management, business rules.
Inputs: Domain model interfaces, repository contracts from the Data Architect.
Outputs: ForecastViewModel, GetDailyForecastUseCase, ForecastUiState data class.
What it doesn't touch: Compose functions, Retrofit setup, databases.

3. The UI Specialist

Scope: Composable functions, UI polish, theming, navigation.
Inputs: UiState definitions from the Logic Lead, design specs or wireframes.
Outputs: ForecastScreen, DailyForecastCard, navigation graph integration.
What it doesn't touch: Business logic, data layer, Dagger modules.

Think of spinning up a new subagent the way you'd think about spinning up a new coroutine: scoped, purposeful, and cleaned up when the task is done.

4. Saving Your Game: The Hand-Off Document

You've been working with your Data Architect agent for 90 minutes. The context is getting long, the responses are slowing down, and you've noticed a hint of Stage 2 drift. Time to wrap up and hand things off.

The Hand-Off Document is your "save game." This is a temporary HANDOFF.md file kept in your project root to capture the "live state" before the context window saturates. Think of it as State Hydration for your agent: a high-density Instruction Pointer that restores the mission and tells the LLM exactly where to resume execution.

How to Ask the AI for a Hand-Off

Don't write the HANDOFF.md yourself. Let the current agent generate it, it has the full context right now. Use this prompt when a session gets "stale" or you finish a layer of architecture:

What a Good HANDOFF.md Looks Like

A well-generated hand-off is laser-focused. It does not include code dumps or lengthy explanations. Here's a realistic example:

Tight. No fluff. The next agent can read this in seconds and hit the ground running.

5. Starting Fresh: The Priming Prompt

A brand new AI session has zero memory. It doesn't know your project, your conventions, your architecture, or the three debates you had about caching strategy. You need to "prime" it with the hand-off you just generated.

Priming is more than just pasting the HANDOFF.md into the chat. Done well, it also establishes role, context, and a clear first task, all in one message. This is exactly how you'd onboard a new hire who's joining mid-project.

How to Prime the New Session

Upload your HANDOFF.md and only the core interface files (like WeatherRepository.kt, the contract the new agent needs to work with), then use this prompt structure:

The Three Parts of a Good Priming Prompt

Role: Tell the agent who it is. This primes it to use domain-appropriate language and make assumptions consistent with that expertise. "Expert Android engineer" signals something different than "general developer."
State: Point to the hand-off document. Don't paste it inline; attach it as a file. That keeps your initial prompt clean and reserves tokens for actual work.
First Task: Give it one atomic task to start. Not "build the ViewModel layer." Instead: "Implement ForecastUiState." Validate that first, then ask for the next piece. Atomic tasks mean atomic commits and atomic validation steps.

6. The Long Game: Tracking with a Feature Task Document

HANDOFF.md is a working document. It gets overwritten at the end of every session. It's your session-to-session baton pass.

The Feature Task Document is something different. It's your stable "Source of Truth" for the entire feature's lifetime, spanning multiple days, multiple engineers, and dozens of AI sessions. Where the hand-off is a sticky note, the feature task document is the spec.

For our 7-Day Forecast feature, this lives permanently in your repo at docs/features/WEATHER_FORECAST.md.

Why You Need It

Multi-day tracking: It helps you pick up exactly where you left off after a weekend, or a fire drill that pulled you off the feature for three days.
Decision logging: It records why you made a choice, not just what you chose. This prevents "Decision Drift" in future sessions where a new AI agent (or a new engineer) re-litigates settled questions.
Context efficiency: Instead of dumping ten files into a new session, you can point the AI to this single document to give it a high-level map of the feature. It's a token-efficient alternative to re-pasting everything.
Team alignment: If another engineer needs to jump in, this document is their crash course.

Feature Task Document Template

Here's a full template you can adapt for any feature in any project:

Notice the Session Log. This is the piece most teams skip, and it's the most valuable for long-running features. When you return after two days away and can't remember where you left off, that table gives you back five minutes of reorientation in five seconds.

7. Advanced Techniques: Squeezing More From Every Session

Once you've internalized the subagent pattern and the hand-off workflow, there are a few extra techniques that separate good context management from great context management.

Reference, Don't Paste

The single most common context-wasting mistake is pasting entire files into the chat. Unless the AI genuinely needs to read every line of a 300-line file, don't paste it. Instead:

Paste only the interface or contract, not the implementation.
Reference file names and describe what's in them: "The ForecastMapper.kt is a file of pure functions that convert ForecastDTO to DailyForecast domain objects."
Ask the AI to request the specific functions it needs, rather than front-loading everything.

The Vertical Slice Session

Instead of building an entire layer at once, consider a vertical slice session: one end-to-end path through all layers for a single use case. For example: implement just the "happy path" for loading and displaying today's temperature: DTO through ViewModel through a single Composable.

This keeps context tight because you're not expanding horizontally across a whole layer. You're drilling down on one narrow scenario, validating it completely, and committing it before expanding. You may have heard of this referred to by a different name, bullet tracing.

Commit as a Context Checkpoint

Treat each git commit as a context checkpoint. When you commit, the current state of the code is safely persisted outside the AI's memory. This means you can start the next session fresh, knowing you can always point the AI at the git history if context is needed.

Get into the habit: implement one atomic piece, validate it, commit it, then continue. Never build three things before committing one. The commit log becomes your external memory.

Warm Up, Don't Dump

When starting a new session, resist the urge to dump everything at once. Instead, warm the agent up progressively:

First message: establish role and architecture (2-3 sentences).
Second message: attach the hand-off document and ask it to confirm its understanding.
Third message: give the first task.

This mirrors how you'd actually onboard a new team member: orientation before task assignment.

8. Summary: Key Takeaways for the Software Engineer

To stay productive with Agentic AI, treat your context window as a finite resource, much like your app's memory. Here are the core principles to keep in mind:

Context is RAM. Every word, code snippet, and log counts against the AI's working memory. Paste deliberately. Reference instead of paste when you can. Avoid the "Context Killer": oversized logs, full JSON payloads, and entire files the AI doesn't need.

Divide and Conquer. Use the Subagent Pattern by creating fresh, focused sessions for each architectural layer. The Data Architect doesn't need to know about Compose. The UI Specialist doesn't need to know about Room migrations.

Checkpoint Your Work. Use HANDOFF.md to transfer state between sessions, a Feature Task Document to track long-term progress, and regular git commits to persist changes outside the AI's memory entirely.

Be Atomic. Give the AI one small, testable task at a time. Implement it. Validate it. Commit it. Then move to the next. Stacking too many tasks in one session is the express lane to Logic Rot.

Detect the Rot Early. Stage 1 and Stage 2 symptoms (subtle forgetting, instruction drift) are your warning signs. When you see them, don't keep pushing; save, generate a hand-off, and reset. Fighting a degraded context is like fighting a memory leak by adding more features.

By managing your context window with rigor - scoped, intentional, and cleaned up when the work is done - you'll transform AI from a temperamental helper into a reliable, high-speed engineering partner.

The context window isn't a limitation to work around. It's a resource to engineer with.

Page updated

Google Sites

Report abuse