Chapter IX · Philosophy

Why Dogfood Everything

Ship it. Use it. Break it. Fix it. Write it down. The loop is short on purpose. Anything in this cookbook that did not survive that loop is not in this cookbook.

What this is

A position piece on a working rule the rest of the cookbook obeys: every tool, pattern, and recipe in here has been deployed against my own actual workload and broken at least once. That is the gate for inclusion. There are no “future-proof” patterns, no “best-practice” ideas borrowed from blog posts, no “this would probably work” suggestions. If it has not run in service on this machine, it is not here.

Why this way

Three things drive the rule:

1. Most “best practices” decay on contact with reality

The lifecycle of a technical idea on the internet:

  1. Someone solves a real problem with a real constraint.
  2. They write a blog post.
  3. Other people read the post.
  4. They cargo-cult the answer into a different problem with different constraints.
  5. The answer is now folklore. The constraints are gone.

The pattern repeats indefinitely. Half of “best practice” is one engineer’s specific tradeoff, generalized past where it was true. The only defense is to run a thing yourself against a real workload before you write about it. Dogfooding is the cheapest filter.

2. Real workloads expose real failure modes that synthetic tests do not

The agent stack in this cookbook has been broken by:

None of those show up in a unit test, an integration test, or a “best practice” article. They show up at 11 pm on a Sunday when a real publish queue stalls. The fix is in the cookbook because it had to be. The reason for the fix is in the Gotchas section because the fix alone is not useful without the cause.

3. The cookbook is for me first, others second

The primary reader of every guide is me, six months later, after I have forgotten which knob I turned. The secondary reader is any engineer who runs a similar stack. Writing for the primary reader keeps the writing honest: a guide that does not survive my own future self reading it is a bad guide. Marketing tone, hype, and “you will love this” prose all fail that test, and they are absent from the cookbook on purpose.

The loop

Ship → Use → Break → Fix → Write → (back to Use)

Each step is small. The loop only stays useful if it stays short.

Ship

Build the minimum thing that solves the problem you have right now. Not the abstraction you would build “if you had time,” not the generalization you would do “for the team,” not the framework you would write “to avoid this problem in the future.” The smallest thing that solves this problem this week.

Examples from this cookbook:

Each became its current shape only after being used. They were never designed in advance.

Use

Run the thing in service. Not a synthetic test, not a tutorial run-through. Actually use it for the workload it was built for, every day, for at least a week. Use is the only thing that produces real signal. Reading does not produce signal, planning does not produce signal, design reviews do not produce signal.

This step is where most projects stop. People build a tool, mark it “done,” and never use it themselves. The tool atrophies because it was never load-bearing.

Break

The tool will break. Every tool breaks. The interesting question is how it breaks and what that tells you about its shape. Some breakage you expected and have a recovery story for. Some breakage you did not expect and now have to think about.

The cookbook’s Gotchas sections are entirely written from this step. Every one of them is a breakage that surprised me. There are no theoretical gotchas.

Fix

Two rules:

  1. Fix the proximate cause now, so the system works again.
  2. Fix the deeper cause next, so the failure does not recur.

The two are not the same fix. The proximate fix might be “restart the gateway.” The deeper fix is “wire the env file back into the systemd unit so the next upgrade does not strip it.” Without the second fix, you will live the same incident again.

Skipping rule 2 is the most common antipattern. People restart the service, the alarm clears, and they move on. Three weeks later, the same alarm fires. That is the universe asking you to do step 2.

Write

Write it down. Three places, depending on what it is:

WhereWhat goes there
Code commentsThe kind that warn the next reader away from a subtle thing
Memory cardsDurable knowledge that future-you needs to recall
CookbookA pattern other engineers can lift

A breakage that has happened twice but is not written down anywhere is a guarantee it will happen a third time. The cookbook itself is the long-form version of this rule.

The writing is the cheap part. Most engineers I know are bad at this not because writing is hard, but because they treat it as a chore that happens after the work. Writing is the work. You did not understand a fix until you can explain it to someone else.

What dogfooding is not

It is not “use my own product in beta and call that production.” Real dogfooding has stakes:

If the system you are dogfooding has no real stakes, you are not dogfooding. You are running a parallel sandbox. The bugs you find in a sandbox are sandbox bugs. The bugs you find in production are production bugs. Only the second kind matters.

This is also not “ship to users who pay so we can call it dogfooded.” Other people running your code is great; it is not a substitute. Your incentive structures are different from your users’, and the bugs they will report are filtered by their willingness to file an issue. You will not file an issue against yourself. You will fix the bug.

When dogfooding stops working

A short list. If you are in one of these, the rule needs a different shape:

How this rule shows up in the cookbook

A few markers that this rule is being applied:

When you read a guide here, the implicit promise is: this ran. It broke. I fixed it. The writeup is what I wish someone had handed me before the breakage.

Templates

This piece is about the writing discipline, not a runnable artifact. Pair with: