Andrej Karpathy · YC AI Startup School · Jun 2025

Agentic Engineering

Agentic engineering designs the agentic loops where models plan, act, observe, and refine — and engineers the context that steers every turn of the loop.

1.0PLAN2.0ACT3.0OBSERVE4.0REFINE

Andrej Karpathy

“In industrial-strength LLM apps, filling the context window is a delicate art and science.”

LLM

= CPU

Context Window

= RAM

LLMs are a new kind of operating system. The context window is the model's working memory, where every token must be carefully placed.

Science
Task descriptions, few-shot examples, RAG, multimodal data, tools, state, and history
Art
Guiding intuition around LLM psychology and understanding human spirits
Balance
Too little context = poor performance; too much = high cost and performance degradation

Why agentic engineering, now

Pioneered by Tobi Lütke and Andrej Karpathy, the work moves from one-shot prompts to engineering the loops that let agents run real, industrial-strength tasks end to end.

  1. Previously: Short task descriptions

    Thick software layer

    Complex systems coordinating LLM calls

  2. Previously: One-shot instructions

    Closed agentic loops

    Plan → act → observe → refine on every step

  3. Previously: Static, unchanging information

    System prompt learning

    LLMs learning by taking their own notes

On the “ChatGPT wrapper” framing, Karpathy is blunt: “This term is tired and really, really wrong.”

Software in the age of AI

The new software paradigm Karpathy defined at YC AI Startup School, 2025.

1.0

Classical programming

  • Explicit instructions
  • Deterministic behavior
  • Human-written code
2.0

Neural network era

  • Data-driven
  • Learned behaviors
  • Weight optimization
3.0

Agentic engineering era

  • Agentic loops
  • Multi-agent systems
  • Dynamic adaptation

The “jagged intelligence” aside

LLMs can solve complex math problems yet fail at simple tasks. They are strong at complex reasoning, creative problem solving, and language understanding — weak at simple arithmetic, context drift, and consistency. That profile is exactly why the loop matters: observe, verify, and correct before the next step.

1.0Plan

Filling the context window effectively. Distribute the token budget, then assemble it deliberately for every request.

Context budget

System prompt
10–20%
Examples
20–30%
RAG content
30–40%
History
10–20%
Buffer
10%
01

Context Window Planning

Strategically distribute your token budget

System prompt (10-20%), Examples (20-30%), RAG content (30-40%), History (10-20%), Buffer (10%)

02

Dynamic Context Assembly

Create custom context for each request

Task analysis → Relevant retrieval → Priority sorting → Token optimization → Context injection

03

Cascading Context Strategy

Break down and chain complex tasks

Decompose large tasks into subtasks, use optimized context for each, merge results

04

Context Decay & Refresh

Clean old information, add new

Temporal relevance scoring, sliding window approach, importance-based retention

05

Multi-Agent Orchestration

Specialized agents with different contexts

Each agent has its own context, coordinator agent management, shared memory systems

2.0Act

The building blocks the loop reaches for at each step — retrieval, memory, tools, and compaction.

RAG (Retrieval-Augmented Generation)

Dynamic information retrieval enables LLMs to access current and accurate information through vector databases and semantic search.

State & History Management

Intelligent management of conversation history, user preferences, and application state. Critical for efficient context window usage.

Few-Shot Examples

Carefully selected examples for the task. Ensures LLMs produce output in the desired format and quality.

Tool Use & Function Calling

LLM interaction with external systems. Required for API calls, database queries, and computations.

Multimodal Context

Combining text, images, audio, and other data types. Critical for rich context creation.

Context Compaction

Maximum information density without exceeding token limits. Summarization, filtering, and prioritization techniques.

3.0Observe

Watch what the step produced, name the failure mode, and apply the engineering response before the next turn.

Lost in the MiddleStrategic positioningPlace critical information at beginning and end, prevent mid-context loss
Context Window LimitsSmart compressionToken efficiency through semantic chunking, summarization, and prioritization
Hallucination RiskGrounding techniquesAccuracy control with RAG, fact-checking, and validation gates
Context SwitchingState managementMaintain continuity with session persistence and memory systems
Performance DegradationSelective loadingPerformance optimization with relevance scoring and lazy loading
Cost ExplosionToken economyCost control through caching, reuse strategies, and efficient encoding
4.0Refine

The loop's output: systems that improve as they run, with autonomy you can dial up over time.

Code Generation Systems

Systems like GitHub Copilot and Cursor use agentic engineering to understand entire codebases and generate consistent code.

  • Understanding project structure
  • Maintaining code style
  • Import and dependency management

Enterprise AI Assistants

Corporate AI assistants use agentic engineering to understand organizational knowledge and processes.

Autonomous Agents

Systems like AutoGPT run long-horizon agentic loops to execute multi-step tasks independently.

Educational Systems

Personalized learning platforms use student context to provide adaptive learning experiences.

Autonomy

ManualAutonomous

A continuous spectrum, not a switch. Users dial the autonomy level of a system from full manual control to fully autonomous operation.

Self-improving systems

System prompt learning lets LLMs learn from their own experience. Each interaction becomes a data point that improves the system's context strategy — mechanisms similar to the way humans take notes and learn.

AI-native architecture

Systems designed from the ground up for agents. Human interfaces become secondary as API- and context-first approaches take priority: build for agents, adapt for humans.

4.0 → 1.0 · the loop closes

What the loop changes

No vanity metrics — just the mechanisms that move the work.

Grounding
RAG and validation gates reduce hallucination.
Compaction
Summarization and prioritization tighten token budgets.
Closed loops
Observe-and-correct steps enable autonomous, long-horizon throughput.

In practice: Shopify

Led by CEO Tobi Lütke, Shopify Magic and Sidekick apply agentic engineering principles to provide AI support to millions of merchants — grounding responses in store context and product catalogs, analyzing merchant behavior history, and injecting e-commerce best practices into the loop.