HomeBlogAI for coding that survives context resets: harness patterns for long-running agents

AI for coding that survives context resets: harness patterns for long-running agents

AI for coding breaks across sessions unless you use a harness. Learn initializer and incremental loops, testing discipline, and how to persist state with a backend that ships fast.

AI for coding that survives context resets: harness patterns for long-running agents

If you have ever used AI for coding to build a real product, you have seen the same pattern repeat. Session one looks great. Session two starts strong. Then the agent runs out of context mid-change, comes back with no memory, and your repo slowly turns into a half-finished maze of TODOs, broken flows, and “I think this already works” claims.

This is not a model quality problem as much as it is a workflow problem. Long-running agentic work is closer to shift work in an engineering team than a single chat. Each new session needs a reliable handoff, a shared definition of done, and a way to quickly verify the system is still healthy before adding the next feature.

In this post we will show a practical harness that makes ai assisted programming predictable across many context windows. Then we will connect it to the backend pieces you actually need when you want the demo to survive real users: persisted state, auth, files, jobs, and push. That is exactly why we built SashiDo - Backend for Modern Builders. It gives you a production-ready Parse based backend with MongoDB, APIs, Auth, Storage, Functions, Jobs, Realtime, and Push in minutes, without turning your solo build into a DevOps project.

The long-running agent failure modes you keep hitting

When an agent works across hours or days, you will almost always see two failure modes.

First, the agent tries to one-shot too much. It starts implementing three features at once, touches ten files, hits the context limit, and leaves the next session with a half-migrated schema and no explanation. The new session spends its budget reconstructing intent instead of moving forward.

Second, the agent declares victory once it sees a bunch of code in place. It opens the repo, notices endpoints and UI screens, and confidently says the feature is done. In practice, it has not been verified end-to-end, the edge cases are missing, and the last-mile glue is broken.

The fix is not “prompt harder”. The fix is to turn the work into a loop with guardrails. A harness that forces small increments, writes durable artifacts, and makes “done” mean “tested, committed, and marked passing”.

AI for coding across sessions: the harness that keeps agents honest

A reliable long-running setup has two distinct phases. Think of them like setting up a new repo for a team, then running daily work.

The first phase is an initializer run. Its job is to prepare the environment so every later session can start from a stable baseline. The second phase is the repeated coding run. Its job is to move one feature from failing to passing, and leave a clean handoff.

This split matters because you do not want to pay for repeated “figuring out how to run the app” tokens. You want every session to begin with the same fast routine. Get bearings. Verify health. Pick the next failing feature. Implement. Test. Commit. Update progress.

When we help builders ship on SashiDo - Backend for Modern Builders, this is the same mental model we encourage. Treat your backend and your agent loop as a system. The system needs state, repeatability, and recoverability.

The initializer run: make the repo self-explanatory

The initializer run creates three artifacts that do most of the heavy lifting for long-horizon work.

A feature list that defines “done”

Start by expanding the vague user prompt into a structured feature list. The important part is not the number of features. It is the shape.

We recommend writing each feature as an end-to-end behavior with steps, and tracking a single boolean such as “passes”. Keep everything initially failing.

Two things happen when you do this. You stop the agent from prematurely claiming completion, and you create a stable to-do list that survives context resets. It also gives you a clean way to prioritize. You can always pick the highest-impact failing item next.

A detail that matters in practice is format discipline. JSON tends to reduce accidental rewrites compared to freeform Markdown because it is easier to validate and harder to “creative edit”. The agent should only flip “passes” from false to true after verification, not rewrite descriptions.

A progress log that explains intent

A plain text progress file sounds too simple, but it is one of the best cost hacks for long-running ai software development. Every session ends by writing what changed, what was tested, what is still failing, and what to do next.

This is different from a changelog. It is a handoff note. Keep it short and operational. Mention commands to rerun, URLs to hit, fixtures that were added, and gotchas the next session should know.

A single “init script” that can always bring the project up

The initializer run should also create one script that standardizes the basic routine. Start the app. Run the minimum health checks. Run the quickest end-to-end test you can.

For a web app, the fastest baseline is often “start server, load a page, complete one core flow”. For an API service, it may be “start server, hit a health endpoint, run one smoke test”. The goal is not perfect coverage. The goal is to quickly detect if the last session left the system broken.

The coding run: one feature, fully verified, clean handoff

Once initialization is done, every later session should follow the same loop.

First, get bearings. Confirm the working directory. Read the progress log. Read the feature list. Skim recent git history.

Second, run the init script and perform the baseline verification. If the baseline is broken, fix that first. This is the part most agents skip when you do not enforce it, and it is why later sessions spiral.

Third, pick exactly one failing feature and implement it. The harness should explicitly forbid mixing multiple features. That sounds slower, but it ends up faster because you stop paying the “reconstruct and refactor” tax.

Fourth, test like a human would. For UIs, that usually means some form of browser automation or manual reproduction steps. For APIs, it means exercising the full request path, not just unit tests.

Finally, write a git commit, update the progress log, and flip exactly the relevant “passes” flag.

This is where a lot of programming AI workflows go wrong. The agent writes code, runs one narrow test, and marks the feature done. Your harness should make “done” conditional on a checklist.

Here is a lightweight end-of-session checklist we see work well for solo builders:

  • The app starts cleanly from a cold start using the init script.
  • The baseline smoke flow works.
  • The new feature works end-to-end at least once.
  • The feature list is updated only by flipping the pass state for that feature.
  • The progress log says what changed, how it was tested, and what to do next.
  • A descriptive git commit exists for the change.

If you want a canonical reference for writing commits that stay readable over time, Git’s own guidance is a good baseline, especially around concise subjects and explain-why bodies in the commit message. See Git’s SubmittingPatches documentation.

Persisting agent state without turning it into an infra project

Long-running agents do not just need text artifacts. Real apps need durable state that survives redeploys, laptops, and new environments.

The common solo-founder trap is wiring state persistence late, after the agent already generated a lot of local-only assumptions. Then you spend days migrating.

A better approach is to decide early where state lives, and make your harness write to it deliberately.

What to persist, in practice

For most agent-driven product builds, you need to persist three categories of state.

You need product state. Users, sessions, projects, and whatever domain objects your app actually manages.

You need agent state. Progress snapshots, task queues, feature status, and traces that help you resume work without guessing.

You need operational state. Logs, retry queues, and background work that should continue even when the client is gone.

This is where SashiDo - Backend for Modern Builders fits naturally into a long-running harness. Every app comes with a MongoDB database and CRUD APIs, plus a complete user management system with social logins. That means you can persist the objects your agent needs, and secure them behind Auth, without first building your own backend scaffolding.

If you want to go deeper on the underlying platform, our stack is based on Parse. Our Developer Docs & Quickstarts explain how to model data, use the SDKs, and structure Cloud Code functions.

A simple pattern: agent checkpoints as first-class records

Instead of keeping all “agent memory” in a single text file, store checkpoint records in your backend as well. For example, a checkpoint can include the current feature being worked, a link to the git commit hash, the last known good build status, and a short summary.

This gives you two benefits. First, you can resume work across machines and teams. Second, you can build a small dashboard view to see where things stand without opening the repo.

When you store checkpoints, decide early how long they should live. Some checkpoint data should expire. MongoDB TTL indexes are a clean way to auto-delete documents after a time window, which is useful for temporary sessions or ephemeral agent traces. The official reference is MongoDB’s TTL index documentation.

Testing that matches reality, not vibes

Agents are surprisingly good at producing code that looks correct. They are less reliable at proving it works end-to-end unless you demand it.

In long-running work, tests have an extra job. They are not only about catching regressions. They are about preventing context-window amnesia from rewriting reality.

A practical approach is layered. Keep a fast smoke test that runs every session. Add end-to-end tests for key user journeys. Then use minimal unit tests for tricky logic, not as your only safety net.

Security testing matters too, especially once you add auth and file uploads. For a structured way to think about verification depth, we often point builders to a standard like OWASP ASVS because it turns vague “is it secure?” questions into concrete checks you can apply as you grow. The project page is OWASP Application Security Verification Standard.

Backend building blocks that keep the harness moving

Once your harness is stable, the next bottleneck is usually not code generation. It is the glue work. Auth, file handling, background jobs, and notifications.

This is where a backend platform can either accelerate you, or slowly add friction through integrations.

Auth that does not derail your sprint

If your app needs accounts, wire it early. Without auth, your agent will tend to build features that assume a single user, then you will retrofit permissions later.

On SashiDo we ship user management out of the box, including social logins like Google and GitHub, so your harness can test real user flows. This is especially useful when your feature list includes “sign in, create project, resume later”, which is the heart of long-running agent apps.

Files and artifacts that are safe to store and fast to serve

Agents produce artifacts. Generated images, exports, logs, attachments, and user content.

Our Files feature is backed by an AWS S3 object store with a built-in CDN, so you can store and serve any file type without designing your own storage layer. If you want to understand what S3 guarantees and what it does not, the canonical reference is the Amazon S3 Developer Guide.

Background jobs for the work that should not block the UI

Long-running agent workflows often need background steps. Embedding generation, retries, periodic cleanups, and scheduled sync.

In SashiDo you can schedule and manage recurring jobs from the dashboard. This pairs well with a harness that writes “next steps” into a job queue so work continues after the interactive session ends.

Push notifications that make async flows feel instant

As soon as you have “wait for the agent to finish” flows, you need a way to bring the user back when something is ready. Push is the obvious lever.

We deliver cross-platform push notifications for iOS and Android, and we run large volumes daily. That means you can keep your app loop tight even if tasks take minutes or hours.

Cost control for solo builders: avoid surprise bills while you iterate

The harness improves reliability. The platform choice improves iteration speed. But solo builders also need predictability.

A good rule is to treat costs like a feature. Track what drives requests, storage, and background compute. Keep a small baseline smoke test. Avoid rerunning expensive flows when the baseline fails.

If you are evaluating SashiDo, we recommend checking the live numbers on our SashiDo pricing page because they can change over time. Today we offer a 10-day free trial with no credit card required, and the entry plan is priced per app per month with generous included requests, storage, and unlimited collaborators.

Scaling is another place where founders get surprised. If you need more compute, our Engines let you scale predictably by choosing the right instance type. Our breakdown and cost model is explained in Power up with SashiDo’s Engine feature.

When to use a backend platform vs rolling your own

If your project is purely local, a harness plus files in a repo might be enough. But if your goal is “ship a demo that real users can log into”, you will want hosted persistence, auth, and a way to deploy functions.

Some builders reach for alternatives early. If you are comparing, we have direct breakdowns of trade-offs in SashiDo vs Firebase, SashiDo vs Supabase, and SashiDo vs AWS Amplify. The main thing to optimize for as an indie hacker is not theoretical flexibility. It is how fast you can get a stable, testable loop that persists state and scales without rework.

A harness you can actually stick with

The best harness is boring. It is a routine that never changes. Read the artifacts, verify baseline health, do one increment, test like a user, commit, update the pass status.

Once you have that, best ai code generator discussions become less important, because you have a system that can absorb model variance. The harness catches overconfidence, context loss, and incomplete testing before it hits production.

If you want to keep your long-running agent loop simple while still shipping a real backend, you can explore SashiDo’s platform and stand up MongoDB APIs, Auth, Files, Jobs, and Serverless functions in minutes.

Conclusion: reliable AI for coding is mostly about the handoff

Long-running agents fail in predictable ways. They overreach, forget what they did, and mark work complete without proving it. A good harness fixes this by separating initialization from execution, using durable artifacts like a feature list and progress log, forcing incremental progress, and treating end-to-end testing as the definition of done.

Once you add persistence, auth, files, and async work, your harness needs infrastructure that will not slow you down. That is why we built SashiDo - Backend for Modern Builders. It lets you keep the AI for coding loop tight while your backend is ready for real users, not just a local demo.

Sources and further reading

Find answers to all your questions

Our Frequently Asked Questions section is here to help.

See our FAQs