Andrej Karpathy · YC AI Startup School · Jun 2025

Agentic Engineering

Agentic engineering designs the agentic loops where models plan, act, observe, and refine — and engineers the context that steers every turn of the loop.

Talk to our team

Andrej Karpathy

“In industrial-strength LLM apps, filling the context window is a delicate art and science.”

LLM

= CPU

Context Window

= RAM

LLMs are a new kind of operating system. The context window is the model's working memory, where every token must be carefully placed.

Science: Task descriptions, few-shot examples, RAG, multimodal data, tools, state, and history
Art: Guiding intuition around LLM psychology and understanding human spirits
Balance: Too little context = poor performance; too much = high cost and performance degradation

Why agentic engineering, now

Pioneered by Tobi Lütke and Andrej Karpathy, the work moves from one-shot prompts to engineering the loops that let agents run real, industrial-strength tasks end to end.

Previously: Short task descriptions
→ Thick software layer
Complex systems coordinating LLM calls
Previously: One-shot instructions
→ Closed agentic loops
Plan → act → observe → refine on every step
Previously: Static, unchanging information
→ System prompt learning
LLMs learning by taking their own notes

On the “ChatGPT wrapper” framing, Karpathy is blunt: “This term is tired and really, really wrong.”

Software in the age of AI

The new software paradigm Karpathy defined at YC AI Startup School, 2025.

1.0

Classical programming

Explicit instructions
Deterministic behavior
Human-written code

2.0

Neural network era

Data-driven
Learned behaviors
Weight optimization

3.0

Agentic engineering era

Agentic loops
Multi-agent systems
Dynamic adaptation

The “jagged intelligence” aside

LLMs can solve complex math problems yet fail at simple tasks. They are strong at complex reasoning, creative problem solving, and language understanding — weak at simple arithmetic, context drift, and consistency. That profile is exactly why the loop matters: observe, verify, and correct before the next step.

1.0Plan

Filling the context window effectively. Distribute the token budget, then assemble it deliberately for every request.

Context budget

System prompt: 10–20%
Examples: 20–30%
RAG content: 30–40%
History: 10–20%
Buffer: 10%

Context Window Planning

Strategically distribute your token budget

System prompt (10-20%), Examples (20-30%), RAG content (30-40%), History (10-20%), Buffer (10%)

Dynamic Context Assembly

Create custom context for each request

Task analysis → Relevant retrieval → Priority sorting → Token optimization → Context injection

Cascading Context Strategy

Break down and chain complex tasks

Decompose large tasks into subtasks, use optimized context for each, merge results

Context Decay & Refresh

Clean old information, add new

Temporal relevance scoring, sliding window approach, importance-based retention

Multi-Agent Orchestration

Specialized agents with different contexts

Each agent has its own context, coordinator agent management, shared memory systems

2.0Act

The building blocks the loop reaches for at each step — retrieval, memory, tools, and compaction.

RAG (Retrieval-Augmented Generation)

Dynamic information retrieval enables LLMs to access current and accurate information through vector databases and semantic search.

State & History Management

Intelligent management of conversation history, user preferences, and application state. Critical for efficient context window usage.

Few-Shot Examples

Carefully selected examples for the task. Ensures LLMs produce output in the desired format and quality.

Tool Use & Function Calling

LLM interaction with external systems. Required for API calls, database queries, and computations.

Multimodal Context

Combining text, images, audio, and other data types. Critical for rich context creation.

Context Compaction

Maximum information density without exceeding token limits. Summarization, filtering, and prioritization techniques.

3.0Observe

Watch what the step produced, name the failure mode, and apply the engineering response before the next turn.

Failure modeEngineering response

Lost in the MiddleStrategic positioningPlace critical information at beginning and end, prevent mid-context loss

Context Window LimitsSmart compressionToken efficiency through semantic chunking, summarization, and prioritization

Hallucination RiskGrounding techniquesAccuracy control with RAG, fact-checking, and validation gates

Context SwitchingState managementMaintain continuity with session persistence and memory systems

Performance DegradationSelective loadingPerformance optimization with relevance scoring and lazy loading

Cost ExplosionToken economyCost control through caching, reuse strategies, and efficient encoding

4.0Refine

The loop's output: systems that improve as they run, with autonomy you can dial up over time.

Code Generation Systems

Systems like GitHub Copilot and Cursor use agentic engineering to understand entire codebases and generate consistent code.

Understanding project structure
Maintaining code style
Import and dependency management

Enterprise AI Assistants

Corporate AI assistants use agentic engineering to understand organizational knowledge and processes.

Autonomous Agents

Systems like AutoGPT run long-horizon agentic loops to execute multi-step tasks independently.

Educational Systems

Personalized learning platforms use student context to provide adaptive learning experiences.

Autonomy

ManualAutonomous

A continuous spectrum, not a switch. Users dial the autonomy level of a system from full manual control to fully autonomous operation.

Self-improving systems

System prompt learning lets LLMs learn from their own experience. Each interaction becomes a data point that improves the system's context strategy — mechanisms similar to the way humans take notes and learn.

AI-native architecture

Systems designed from the ground up for agents. Human interfaces become secondary as API- and context-first approaches take priority: build for agents, adapt for humans.

4.0 → 1.0 · the loop closes

What the loop changes

No vanity metrics — just the mechanisms that move the work.

Grounding: RAG and validation gates reduce hallucination.
Compaction: Summarization and prioritization tighten token budgets.
Closed loops: Observe-and-correct steps enable autonomous, long-horizon throughput.

In practice: Shopify

Led by CEO Tobi Lütke, Shopify Magic and Sidekick apply agentic engineering principles to provide AI support to millions of merchants — grounding responses in store context and product catalogs, analyzing merchant behavior history, and injecting e-commerce best practices into the loop.

Explore our solutions