Over a single weekend, Andrej Karpathy hacked together LLM Council-a small app where multiple large language models debate each other before producing a final answer. The architecture is simple: a thin backend, a React frontend, and an API router sitting in the middle.
It’s a neat demo-and a sharp reminder for founders: the logic of AI orchestration can be weekend-hack simple, but the AI infrastructure that runs it safely, in real time, and at scale is not.
If you’re an AI-first startup founder, indie dev, or non-technical founder, you probably don’t want to become an expert in Kubernetes, GDPR, or multi-region databases just to ship a product. You want:
- An AI-ready backend that you don’t have to babysit
- No vendor lock-in, especially at the data layer
- Real-time apps that feel instant to users
- A stack that lets you ship in days, not quarters
This article walks through what “AI infrastructure” actually means in practice, how middleware like Karpathy’s weekend project fits in, and how technologies like Parse Server and NodeJS can give you a production-ready foundation without hiring a DevOps team.
Understanding AI Infrastructure
In 2025, you don’t “build AI” by spinning up your own models. You compose APIs, data, and orchestration into a product. That composition layer is your AI infrastructure.
At a high level, AI infrastructure is everything between your users and the AI models you call:
- The backend APIs that receive requests
- The data stores that hold user state, history, and permissions
- The orchestration logic that routes prompts and aggregates responses
- The governance and observability that keep you compliant and online
A good AI-ready backend makes it easy to add or swap models without rewriting your whole system.
Key Components of AI Infrastructure
You don’t need a huge platform team to get this right. You do need to be intentional about a few layers.
1. Data & identity layer
This is where Parse Server shines as a backend-for-frontend layer:
- User authentication and sessions
- Class-level permissions and roles
- Object storage (documents, chat histories, preferences)
- Files (uploads, model attachments, logs)
Open-source Parse Server is battle-tested and can run on top of MongoDB, giving you structured data with flexible schemas and relations. See the project on GitHub for details: Parse Server.
2. Orchestration & agent layer
This is where Karpathy’s LLM Council lives: a thin service that:
- Calls one or more model providers (OpenAI, Anthropic, Google, etc.)
- Aggregates or “votes” over responses
- Applies business rules: safety filters, formatting, routing
You can build this in NodeJS, Python/FastAPI, or any language you like. The important bit is that it stays stateless and simple, while state (users, permissions, conversations) lives in your backend.
Modern orchestration tools like LangChain and LlamaIndex can help structure more complex agent workflows, but they still need a reliable backend behind them.
3. Model access layer
Very few teams talk directly to every model provider’s API. Karpathy used OpenRouter to normalize calls to different models. This is a growing pattern:
- Use an API gateway for LLMs to normalize auth and request formats
- Keep a configuration file or database table that lists your models
- Swap providers without changing your entire codebase
This is where a no vendor lock-in strategy begins: treat models as interchangeable parts rather than foundations.
4. Security, compliance, and governance
This is the part most weekend hacks skip-and where production AI infrastructure becomes non-trivial, especially in Europe:
- Authentication & authorization for every request
- PII detection and redaction before prompts leave your region
- Audit logs of who asked what, and which models saw it
- Data residency to satisfy GDPR and local regulators (overview)
These aren’t “enterprise extras.” They’re table stakes if you’re handling real customers, especially in regulated or EU markets.
The Role of Middleware in AI Solutions
Karpathy’s LLM Council repo is tiny, but it quietly answers one big question: how complicated does AI middleware really need to be?
Under the hood, the pattern is straightforward:
- Receive a user request.
- Fan it out to several models in parallel.
- Let the models critique each other.
- Have a final model synthesize a single answer.
The code is simple. What’s missing is everything around it: auth, permissions, rate limiting, logging, monitoring, retry logic, and data governance.
Importance of Middleware in AI Systems
Even for a small product, an orchestration or middleware layer is essential because it lets you:
- Avoid tight coupling to any single provider
- Use an abstraction (like OpenRouter or your own gateway) so switching models is a config change, not a rewrite.
- Centralize business rules
- Safety filters, custom system prompts, text formatting, localization, and content policies live in one place.
- Enforce policies in one layer
- Restrict which roles can use which models.
- Apply cost controls and rate limits.
- Experiment faster
- A/B test different models or prompt strategies.
- Log prompts/responses for later evaluation.
Karpathy himself called his project “99% vibe-coded,” meaning much of it was generated by AI and not designed to live for years. That’s actually a useful mindset for this layer:
- Treat orchestration code as disposable scaffolding.
- Keep it small, readable, and easy to regenerate.
- Let your persistent value live in data, configuration, and a managed backend.
If you build your AI middleware on top of a stable, open backend (like Parse Server on NodeJS), you get the best of both worlds: fast iteration at the orchestration layer, and a durable, compliant substrate beneath it.
Useful reference architectures and discussions of AI gateways and orchestration can be found in resources like LangSmith and various open-source gateways on GitHub.
Building Real-Time Applications on an AI-Ready Backend
For AI-first products, real-time apps are often non-negotiable:
- Token-by-token streaming from LLMs
- Collaborative editors and canvases
- Live dashboards driven by model outputs
- Notifications that feel instant, not batch-processed
This is where your choice of backend matters at least as much as your choice of model.
Techniques for Implementing Real-Time Features
You have a few core patterns to choose from:
-
WebSockets
-
Bi-directional, low-latency communication.
-
Great for chats, games, collaborative tools.
-
Server-Sent Events (SSE)
-
One-way streaming from server to client.
-
Works well for streaming model responses token-by-token.
-
Pub/Sub and change streams
-
Backend services publish events; clients subscribe to relevant channels.
-
Useful for analytics dashboards, notifications, and background jobs.
-
Real-time database subscriptions (Live Queries)
-
The backend automatically notifies clients when data changes.
- In the Parse ecosystem, this pattern is implemented as LiveQuery.
With Parse Server, you can expose LiveQuery channels tied to collections/classes. A client subscribing to Messages or Tasks gets updates as soon as objects are created or modified. This is a strong fit for:
- AI copilots updating suggestions as user state changes
- Shared workspaces where assistants annotate or summarize content live
- Multi-agent tools where several “bots” act on the same dataset
A typical setup for an AI-first product might look like:
- Frontend: React, Vue, or native mobile
- Backend: Parse Server (NodeJS) providing auth, data, Cloud Code
- Real-time: LiveQuery for data changes + SSE or WebSockets for LLM streams
- AI orchestration: Small NodeJS or Python service talking to LLM APIs
Because Parse abstracts the database access and real-time layer, you don’t need to design your own event system from scratch.
For more details on building real-time backends, resources like the Socket.IO documentation and official Parse community docs are helpful starting points.
Why Choose NodeJS for AI Development
A lot of AI infrastructure examples are written in Python, but NodeJS is an excellent choice for orchestration and backend APIs-especially when combined with Parse Server.
Benefits of NodeJS in AI Projects
-
Event-driven, non-blocking I/O
-
Calling LLM APIs is mostly network I/O, not heavy CPU.
-
NodeJS excels at handling many concurrent requests efficiently.
-
Same language across stack
-
Frontend (React, Next.js) and backend (Parse Server, custom services) can all be JavaScript/TypeScript.
-
Easier hiring and onboarding; fewer context switches.
-
Rich ecosystem
-
Tons of mature SDKs for OpenAI, Anthropic, and other providers.
-
Libraries for WebSockets, SSE, queues, metrics, and more.
-
Streaming by default mindset
-
Node streams and async iterators make token streaming from LLMs natural to implement.
-
Good fit with serverless and microservices
-
Easy to deploy small, focused functions that perform a single orchestration task.
The trade-off: if you’re doing heavy local model inference (like running a big transformer on your own GPUs), Python’s ML ecosystem is still stronger. In that case, a common pattern is:
- Keep orchestration, APIs, and product logic in NodeJS.
- Run any heavy ML pipelines in separate Python microservices.
- Communicate over HTTP, gRPC, or a message bus.
The official NodeJS documentation is a good reference if you’re designing for high concurrency and streaming.
From Weekend Hack to Production-Grade AI Infrastructure
Karpathy’s weekend “vibe code” experiment is a perfect reminder: the core orchestration logic for multi-model AI can fit in a few hundred lines. What doesn’t fit in a weekend is everything you need for production.
Before you ship your AI app to real users, sanity-check your stack against this list.
Production Readiness Checklist for AI-First Startups
Security & access control
- Authentication for every app and API
- Role-based access control (RBAC) for models and features
- Rate limiting and abuse protection per user/tenant
Data governance & compliance
- Clear data residency strategy (especially for EU users)
- PII detection and redaction before prompts hit external APIs
- Audit logs for prompt/response flows
- Ability to delete user data and comply with GDPR requests
Reliability & operations
- Retries, backoff, and timeouts for model calls
- Circuit breakers and fallbacks when one provider fails
- Centralized logging and tracing across services
- Metrics for latency, cost per request, model error rates
Developer experience
- Easy way to deploy backend changes (no manual servers)
- Staging environment that mirrors production
- Automated tests for prompts and flows where possible
Many of these capabilities are built “around” your AI middleware, not inside it. This is why a strong, managed backend often creates more leverage than a new orchestration library-especially for small teams.
A Practical Blueprint for AI-First Founders (Without DevOps)
If you’re a solo founder or a small team, you can absolutely build a robust AI stack without standing up your own Kubernetes cluster.
Here’s a pragmatic architecture that keeps complexity under control:
-
Frontend
-
Web: React, Next.js, or similar
-
Mobile: React Native, Flutter, or native
-
Backend (BaaS / MBaaS)
-
Use an open, managed backend based on Parse Server and NodeJS.
-
Let it handle:
- Authentication and user management
- Database with class-level permissions
- Real-time subscriptions (LiveQuery) for core objects
- File storage and uploads
- Background jobs and scheduled tasks
-
AI orchestration service
-
A thin NodeJS (or Python) service that:
- Talks to one or more LLM providers (possibly via an API gateway like OpenRouter)
- Implements multi-model flows (councils, judges, tool-using agents)
- Exposes a simple REST or WebSocket interface to your backend
-
Observability & governance
-
Log prompts and responses to your backend database with proper access controls.
- Track cost, latency, and error rates per model.
-
Add admin views to inspect traffic and troubleshoot issues.
-
AI-ready backend hosting
-
Choose hosting that gives you:
- Auto-scaling without hard request limits
- EU data centers for GDPR-native compliance
- Direct MongoDB access when you need it
- Web hosting with SSL for your admin tools
If you want to keep your team focused on product and AI flows instead of infrastructure, you can build this entire stack on a managed Parse Server platform. That gives you a compliant, AI-ready backend with real-time capabilities, background jobs, and Cloud Code, so you can plug in your orchestration layer and ship quickly. You can explore SashiDo’s platform as one way to get this foundation without hiring a dedicated DevOps team.
Designing AI Infrastructure That Won’t Slow You Down
The lesson from projects like LLM Council isn’t that you should rebuild your entire AI platform every weekend. It’s that the hard part of AI infrastructure isn’t routing prompts-it’s everything that makes those routes safe, observable, compliant, and fast.
For AI-first startups, the winning pattern looks like this:
- A stable, open backend (e.g., Parse Server on NodeJS) that owns users, data, and permissions
- A thin, flexible orchestration layer that you can “vibe code” and replace as models evolve
- Real-time capabilities baked in from day one so your product feels alive
- A deliberate no vendor lock-in stance, especially at the data and model layers
Get those foundations right and you can iterate quickly on agents, workflows, and multi-model strategies-without fighting your infrastructure every time a new model drops.
Your users don’t care how clever your middleware is. They care that your app is fast, reliable, and trustworthy. The right AI infrastructure lets you deliver exactly that, without turning yourself into a full-time DevOps engineer in the process.