Andrej Karpathy's CLAUDE.md is a 4-rule behavioral file that stops Claude Code from over-engineering. Here's what each rule does, plus 8 more rules from six weeks of testing that cut mistake rates from 41% to 3%.
Andrej Karpathy's CLAUDE.md is a short behavioral file you drop into a Claude Code project to stop the agent over-engineering your code. Four rules. About fifty lines. It's the closest thing the AI coding world has to a Strunk and White.

To be precise about what we're talking about: there is no official "Karpathy CLAUDE.md" published by Anthropic. The file is a community-shared distillation of Karpathy's observations on how LLMs fail at coding, mirrored on GitHub at forrestchang/andrej-karpathy-skills. The original repo has 97.8k stars on the back of those four rules alone.
I'm Tom. I've been running Claude Code on every part of my business since the day it shipped. I've also written my own CLAUDE.md for an AI-operated business that is mostly content and operations, not code. So I read Karpathy's file as both a user and an author. Here's what's working in it, what doesn't translate, and how to write yours.

A CLAUDE.md file is a behavioral memory document that Claude Code automatically loads at the start of every session. It lives in your project root, or in ~/.claude/ for global rules. Claude reads it before it does anything else and treats it as a higher-priority instruction than the default system prompt.
Think of it as the instruction page you'd hand a new contractor before letting them touch the codebase. Style preferences. Conventions you've decided on. Mistakes you've watched the agent make twice. Anything you don't want to say out loud every single session.
The file is plain markdown. No special syntax. Claude follows what's written. The more specific and load-bearing your rules, the better the agent behaves.
Karpathy's CLAUDE.md is a four-rule behavioral spec for Claude Code. It's not aimed at any one project. It's a general posture for the agent. The four rules are:
Each rule has half a dozen bullet expansions and a "test" sentence you can apply to any diff. The file ends with one line that describes when it's working: "fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes."
That's the entire file. It does more for an agent than 500 lines of style guide.

When you run claude in a project, the agent walks up the directory tree looking for a CLAUDE.md file. It loads the closest one it finds, then layers your global ~/.claude/CLAUDE.md on top of that. Both feed into the system prompt as project-level memory.

You don't need to invoke the file. You don't need to pass a flag. Claude reads it automatically and treats it as a binding contract. If you say "no em dashes," it stops writing em dashes. If you say "ask before installing dependencies," it asks. If you say "match existing file structure," it stops inventing new patterns.
The technical reason it works so well: Claude Code prepends the contents of CLAUDE.md into the system prompt with strong instruction priority. Anthropic's official memory documentation confirms it. The file isn't a suggestion. It's a directive.
Each rule sits at a specific failure mode of LLM coding assistants. Read the rule, then read what it's actually catching.
The full text is short. Don't assume. Don't hide confusion. Surface tradeoffs. If something is unclear, stop. Name what's confusing. Ask.
The failure mode this catches is the agent guessing at intent. You say "add a search bar" and the model invents an entire search architecture, picks a library you don't use, and writes 800 lines before you can intervene. Karpathy's rule forces a beat. Restate the assumption. Offer the two interpretations. Wait.
Practically, this rule pulls clarifying questions to the front of the session. You spend 30 seconds confirming what you actually want and save twenty minutes of rolling back the agent's first guess.
Minimum code that solves the problem. Nothing speculative. No features beyond what was asked. No abstractions for single-use code. No "flexibility" or "configurability" that wasn't requested. No error handling for impossible scenarios.
The test Karpathy ships with this rule is the one you'll use most: "Would a senior engineer say this is overcomplicated?" If yes, simplify. Claude reads that line and you can feel the difference in the next diff. The function gets shorter. The interface gets concrete. The mock factory disappears.
This rule alone is worth installing the file. It's the most common LLM coding tax there is.
Touch only what you must. Don't "improve" adjacent code, comments, or formatting. Don't refactor things that aren't broken. Match existing style, even if you'd do it differently. The test: every changed line should trace directly to the user's request.
I cannot count the number of times I've watched Claude reformat an entire file because it touched two lines in it. This rule fixes that. The agent edits what it needs to and leaves the rest of the file alone. PR reviews get readable. Diffs get small. The signal-to-noise ratio of every change goes up.
There is a softer second clause inside the rule that matters too. "If you notice unrelated dead code, mention it, don't delete it." The agent surfaces the observation, doesn't act on it. That's the difference between a useful collaborator and a chaotic one.
Transform tasks into verifiable goals. "Add validation" becomes "write tests for invalid inputs, then make them pass." "Fix the bug" becomes "write a test that reproduces it, then make it pass." For multi-step tasks, state a brief plan with checks attached to each step.
The point of this rule isn't TDD. It's verifiability. Strong success criteria let the agent loop independently. Weak criteria force you to babysit every iteration. The rule transfers the verification burden from you to the agent, where it belongs.
This is also the rule that makes Anthropic's new /goal command useful. The command runs Claude in a loop until the goal is met. Without a clean success criterion, the loop never terminates well. With one, it just works.
Karpathy's spine was written in late January 2026 to fix code-writing mistakes. It does that job well. The Claude Code ecosystem in May 2026 is wider than code editing alone. There are multi-step agent loops, hook cascades, skill loading conflicts, and token budgets that blow past unsupervised. The original 4 rules don't reach those.
A separate operator who goes by Mnimiy tested the 4-rule template across 50 representative tasks in 30 codebases over six weeks, then re-ran the same tasks with 8 additional rules layered on top. The mistake rate dropped from 41% (no CLAUDE.md) to 11% (Karpathy's 4) to 3% (his expanded 12). Compliance barely moved between 4 rules and 12, which is the interesting part. The new rules cover failure modes the original 4 weren't aimed at, so they don't compete for the same attention budget.
I tested all 8 against the work I actually do. Six earned a permanent spot in my own file. Here is the full set with the moment each one catches.
Use Claude for classification, drafting, summarisation, and extraction from unstructured text. Don't use it for routing, retries, status-code handling, or deterministic transforms. If code can answer the question, code answers the question.
The failure mode: a snippet that asks Claude "should we retry on a 503?" runs cleanly for two weeks, then starts flaking because the model has begun reading the request body as context for the decision. The retry policy goes random because the prompt has gone random. Pull the call out of the model and into code.
Per-task budget: 4,000 tokens. Per-session: 30,000. If a task is approaching the budget, summarise and start fresh. Surface the breach. Don't silently overrun.
The failure mode: a debugging session runs for 90 minutes. Claude is happily iterating on the same 8KB error message, gradually losing track of which fixes it has already tried. By the end it's re-suggesting fixes you rejected forty messages ago. A hard budget kills the loop at minute 12 before it spirals.
If two existing patterns in the codebase contradict, pick one (the more recent or more tested), explain why, and flag the other for cleanup. Don't blend them.
The failure mode: a codebase has two error-handling patterns. One uses async/await with explicit try/catch. The other uses a global error boundary. Claude writes new code that does both. Errors get caught twice and swallowed silently. The averaging is the bug.
Before adding code in a file, read the file's exports, the immediate caller, and any obvious shared utilities. If you don't understand why existing code is structured the way it is, ask before adding to it. "Looks orthogonal to me" is the most dangerous phrase in any codebase.
The failure mode: Claude adds a function next to an existing identical function it never read. Both functions do the same thing. The new one wins on import order and the old one (which had been the source of truth for six months) goes quietly defunct.
Every test must encode why the behaviour matters, not just what it does. A test like expect(getUserName()).toBe('John') is worthless if the function returns a hardcoded ID. If you can't write a test that would fail when business logic changes, the function is wrong.
The failure mode: twelve passing tests on an auth function. All green. Auth is broken in production. The tests were checking the function returned something, not whether it returned the right thing. The function passes because it returns a constant.
After each step in a multi-step task, summarise what was done, what is verified, and what remains. Don't continue from a state you can't describe back. If you lose track, stop and restate.
The failure mode: a 6-step refactor goes wrong on step 4. By the time you notice, Claude has also done steps 5 and 6 on top of the broken state. Untangling takes longer than redoing the work. Checkpoints catch it at step 4.
If the codebase is snake_case and you'd prefer camelCase, the answer is snake_case. If it uses class-based components and you'd prefer hooks, the answer is class-based. Conformance beats taste inside the file. If you genuinely think a convention is harmful, surface it. Don't fork silently.
The failure mode: Claude introduces hooks into a class-component codebase. They work. They also break every testing pattern that assumed componentDidMount. Half a day to remove and rewrite.
If you can't be sure something worked, say so. "Migration completed" is wrong if 30 records were skipped silently. "Tests pass" is wrong if any were skipped. "Feature works" is wrong if you didn't verify the edge case I asked about. Default to surfacing uncertainty, not hiding it.
The failure mode: Claude reports a database migration completed successfully. It silently skipped 14% of records that hit a constraint violation. The skip was logged but never surfaced. The error shows up 11 days later when downstream reports start looking wrong.
Going from 4 rules to 12 added almost no compliance overhead (78% down to 76%) and cut the mistake rate by another 8 points. The new rules cover failure modes the original 4 didn't address, so they don't compete for the same attention budget.
A useful mental model: Karpathy's 4 fix what happens when Claude is writing code in a single moment. Rules 5 to 12 fix what happens when Claude is running across time, files, sessions, and pipelines. The first set is a posture. The second set is a contract for long-running work.
Three steps. Each takes five minutes. You can copy Karpathy's file verbatim and you'll already be ahead of 90% of projects. The real wins come from layering your project specifics on top.
Drop his file in your project root as-is. Don't edit it yet. Run Claude Code on a real task for a session or two. Notice which rules fire. Notice which ones never come up because they don't match your work.
Here's the copy-paste block. Save it as CLAUDE.md in your project root.
# CLAUDE.md
## 1. Think Before Coding
- Don't assume. If something is unclear, stop and ask.
- Don't hide confusion. Name what's confusing.
- Surface tradeoffs before you make the call.
- Restate the assumption back so we know we agree.
## 2. Simplicity First
- Minimum code that solves the problem.
- No speculative features. No flexibility nobody asked for.
- No abstractions for single-use code.
- No error handling for impossible scenarios.
- Test: would a senior engineer call this overcomplicated? If yes, simplify.
## 3. Surgical Changes
- Touch only what you must to satisfy the request.
- Don't reformat, refactor, or "improve" adjacent code.
- Match the existing style even if you would do it differently.
- If you notice unrelated dead code, mention it. Do not delete it.
- Test: every changed line should trace directly to the user's request.
## 4. Goal-Driven Execution
- Transform tasks into verifiable goals before you write any code.
- "Add validation" becomes "write tests for invalid inputs, then make them pass."
- "Fix the bug" becomes "write a test that reproduces it, then make it pass."
- For multi-step tasks, write the plan first. Attach a check to each step.
- Loop until the goal is verified. Do not declare done until it is.
When this is working: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.If your work is mostly multi-step agent runs, hook chains, or anything that survives across sessions, paste rules 5 to 12 from the section above directly below the 4. The 12-rule version is the floor for that kind of work. Keep the combined file under 200 lines or compliance starts to drop.
One paragraph. Name the project, the stack, the audience, the things you've already decided. Claude wastes the most tokens re-deriving context that should be a one-liner. Pre-empt it.
Every time you correct Claude on something twice, add it to CLAUDE.md. "No em dashes." "Match the existing test file structure." "Don't add a README unless I ask." These are the highest-leverage lines because they encode a real correction you watched the agent need.
My personal CLAUDE.md for the business behind this site has about 60 lines of these. They aren't clever. They aren't even well-written. They are scar tissue from sessions where the agent made the same mistake twice and I decided I'd rather write one line of memory than fix the same mess a third time.
For coding projects, yes. Copy it verbatim and add a project header above it. The four rules are well-tuned for the way most engineers actually want to work with an LLM. There is very little to argue with.
The honest caveat is that Karpathy's CLAUDE.md is written for one kind of work, which is code. If your project is content, ops, or anything where the agent isn't editing source files, the rules need to be adapted. "Surgical Changes" doesn't mean the same thing when the agent is drafting an email as when it's editing a function. "Simplicity First" reads differently when the deliverable is a 2,000 word blog post.
I've adapted mine. Tom's CLAUDE.md for AI Operator OS has 11 sections, almost none of which are about source code. The Karpathy spine is there in the background. The body of the file is operating rules for an agent that runs a business.
The fastest way to show the difference is to put them side by side.
Karpathy's file has four sections:
Mine has eleven:
Karpathy's rules are about how to code. Mine are about what work to do in an AI-run business and which constraints are non-negotiable. They are doing different jobs. Use Karpathy's for engineering work and write something heavier when the agent is running your operations.
The longer you sit with the file, the more it teaches you about how you want to work with an agent. Less about syntax, more about taste. Keven Ellison is one of my mentorship students who pushed this further than most. He had no coding background, started with a basic CLAUDE.md, and now runs a 180-plus node content management system, a personal memory layer, and his own autonomous agent on a cron job, all of it shaped by the rules he iterated into his own file over six months. His story is a useful reference point for what good CLAUDE.md discipline looks like outside of a pure coding project.
Two things to be honest about.
CLAUDE.md is read-only memory from Claude's side. The agent obeys the rules in the file but cannot update them mid-session. That's a feature for stability and a bug for learning. If you watched Claude make a new mistake today, you have to be the one who writes the rule to prevent it tomorrow.
The other limit is sprawl. A 500-line CLAUDE.md will get partially ignored. Long files eat context and dilute the priority signal. Karpathy's 50 lines is closer to the sweet spot than the kitchen-sink documents people end up writing six months in. Cut ruthlessly. Every rule that isn't pulling weight is making the rest weaker.
Yes. For any coding project, drop it in today and you will see fewer over-engineered diffs by the end of the week. The rules are short, the wording is tight, and the philosophy ("verifiable goals, surgical changes, simplicity first") matches the way good engineers already prefer to work.
If your project isn't code, treat Karpathy's file as the spine and write your own body. Same posture, different content. Either way, the right move is to stop arguing with Claude in chat and put the argument in CLAUDE.md once.
It can be both. A CLAUDE.md in your project root applies to that project. A CLAUDE.md in ~/.claude/ applies globally to every Claude Code session on your machine. Claude loads both. Project-level rules override global ones when they conflict.
CLAUDE.md is the Anthropic-specific memory file for Claude Code. AGENTS.md is a cross-vendor proposal aimed at being readable by any coding agent, including Cursor, Codex, and Gemini CLI. They overlap. In a Claude-only stack you only need CLAUDE.md. If you switch between agents, AGENTS.md gives you portable rules.
Yes. The file is model-agnostic. It works with Sonnet 4.6, Opus 4.7, and any future Claude Code model. The four rules describe behavior, not implementation, so model upgrades only make the file more effective.
Yes, with one rename. Save the file as AGENTS.md instead of CLAUDE.md and most coding agents will read it. The wording works across tools because the failure modes Karpathy is describing (assumption, overcomplication, scope drift) are common to all LLM coding assistants, not just Claude.
Mostly yes, especially for hard rules. Phrasings like "Never do X" and "Always do Y" hold up well across long sessions. Subjective rules like "be concise" drift over time and need to be reinforced with specifics. The fix is to write your rules as testable directives, not vibes.
Use the 4-rule version if your work is mostly Claude writing code in single sessions. Use the 12-rule version if you're running multi-step agents, hook chains, or anything that survives across sessions. The extra 8 rules (budgets, checkpoints, fail-loud, read-before-write, conventions, intent-aware tests, conflict surfacing, judgment-only model use) catch failure modes the original 4 weren't aimed at. Across tested codebases, mistake rates moved from 41% with no CLAUDE.md, to 11% with 4 rules, to 3% with 12.
Karpathy's is about 50 lines and that's a good ceiling for the behavioral spine. Add a short project header on top (10 to 20 lines) and you're done. If your file is creeping past 200 lines, you're documenting things that should live in code, not memory.
The original Karpathy mirror at github.com/forrestchang/andrej-karpathy-skills is the canonical reference. Anthropic's official Claude Code memory docs cover the loading rules. For broader patterns, the curated awesome-claude-code list on GitHub has dozens of community CLAUDE.md files worth reading.
The 60-minute Claude Code Blueprint is the fastest path from zero to a working Claude Code setup with a CLAUDE.md that actually behaves. You'll have the file written, the agent installed, and a first working build by the end of the hour. Free. No coding required.
If you want to go further than the blueprint, the 30 Day Claude Code Challenge takes the four-rule posture and applies it across 30 days of structured builds with live calls every Thursday.
Karpathy's file fixes the agent. The Blueprint fixes the operator.
No. The four rules (Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution) are the spine, not a script. Drop the file in your project root verbatim, run Claude Code for a session or two, then layer your own project context and repeat corrections on top. Most of the real wins come from the lines you add after watching the agent make the same mistake twice.
Claude Code prepends the contents of CLAUDE.md into the system prompt with strong instruction priority at the start of every session. The agent walks up the directory tree, loads the closest CLAUDE.md it finds, then layers your global ~/.claude/CLAUDE.md on top. You do not invoke it or pass a flag. It is read automatically and treated as a binding contract.
Use the 4-rule version if your work is mostly Claude writing code in single sessions. Use the 12-rule version if you are running multi-step agents, hook chains, or anything that survives across sessions. In Mnimiy's testing across 50 tasks in 30 codebases, mistake rates moved from 41% with no CLAUDE.md, to 11% with the 4 rules, to 3% with 12.
Yes. The four rules describe behaviour, not implementation, so they are model-agnostic. Karpathy's CLAUDE.md works with Sonnet 4.6, Opus 4.7, and any future Claude Code model. Model upgrades only tend to make the file more effective, because better models follow load-bearing instructions more reliably.
Karpathy's is about 50 lines and that is a sensible ceiling for the behavioural spine. Add a short project header on top (10 to 20 lines) and you are done. If your file is creeping past 200 lines, you are documenting things that should live in code, not memory. Long files dilute the priority signal and the agent starts ignoring sections.
Yes, with one rename. Save the file as AGENTS.md instead of CLAUDE.md and most coding agents will read it. The wording transfers because the failure modes Karpathy is describing (assumption, overcomplication, scope drift) are common to every LLM coding assistant, not just Claude.
Hard rules hold up well across long sessions. Phrasings like 'Never do X' and 'Always do Y' stick. Subjective rules like 'be concise' drift over time and need to be reinforced with specifics. The fix is to write your rules as testable directives, not vibes. Treat every rule as scar tissue from a real correction.
Five interactive lessons. Install Claude Code, build your first automation, and deploy it live on the internet — all in under an hour. Free, no coding required.
Grab the Blueprint →