AI Engineering Handbook

Something is changing. Quietly.

Software used to be deterministic. Given an input, you got an output. A program was a tightly controlled path: logic, branches, state machines, interface contracts. The engineer’s job was to compress the world into forms code could describe exactly.

That model is starting to fail.

More and more systems no longer compute answers directly. They generate answers. They no longer execute rules so much as make decisions under uncertainty that merely seem reasonable.

So the nature of software is changing.

A New Paradigm

Today’s systems often look like this:

a model that generates and understands
a retrieval system that provides external memory
a set of tools that connects the system to the real world
an evaluation layer that decides whether the result is acceptable

Code no longer solves the problem directly. Code now orchestrates a system that can think. That sounds abstract, but engineers quickly realize something:

What you are writing is no longer a program. It is the set of conditions under which a system behaves.

This system may make mistakes. It may hesitate. It may drift off target. You cannot control it completely, but you can constrain it, steer it, and correct it. That is a new engineering paradigm.

We call it AI systems engineering.

Why Read This Book

There is already an enormous amount of material about AI—papers, blogs, open-source projects, tutorials. Nearly everything exists. But most of it answers local questions:

How Transformers work
How RLHF is trained
How to build RAG
How to speed up inference

All of that matters. But one question is rarely answered systematically:

How do these pieces fit together into a complete system?

In the real world, an AI system is never a single technique. It is always some combination of:

training (pretraining, mid-training, alignment)
models (architecture choices, capability distribution)
inference (latency, KV cache, system optimization)
applications (RAG, tools, agents)
evaluation and feedback loops

More importantly, these parts are tightly coupled.

The way you train changes behavior at inference time
Model architecture affects system latency
Retrieval design changes output quality
Tool interfaces determine whether the system is reliable

If you understand only one part, you will have a hard time understanding the system itself.

What This Book Tries to Answer

This book revolves around one question:

How are modern AI systems built?

To answer it, we break the system into five layers: Foundations → Training → Models → Inference → Applications

This is not a taxonomy. It is a way of seeing. You can start with the Transformer and follow the path all the way to agent systems. Or you can start with an application problem and reason backward toward the training and model choices underneath it. Once the links between these layers become clear, AI engineering stops looking like a pile of jargon and starts looking like a system you can actually understand.

Two Writing Principles

This book follows two simple rules.

1. Questions Matter More Than Answers

When models can generate answers easily, the scarce thing is the question. A good question is often worth more than a complicated answer. We care more about:

Why does KV cache reduce complexity?
Why does LoRA work?
Why can RAG sometimes make a system worse?
Why does an agent sometimes get dumber the more you use it?

These questions are the entry points to understanding the system.

2. Diagrams Matter More Than Text

When text can be generated without limit, text itself is no longer evidence. Structure is.

A good diagram is often closer to the truth than a page of prose. Because it exposes the constraints, paths, and tradeoffs of a system, rather than just describing its surface.

That is why this book contains so many structural diagrams. They are not decoration. They are part of the reasoning.

Why There Are Interview Questions

There is a very common illusion:

“I already understand this.”

Then someone asks:

“Then explain why.”

And the understanding collapses.

So this book includes a large number of questions drawn from real AI engineering interviews. Not for the sake of testing. But so you can check one thing:

Do you actually understand it, or do you just recognize the vocabulary?

When you can answer these questions clearly, then you truly understand the system.

How to Read This Book

This is not a book you have to read from front to back.

You can:

start from the foundations and build a complete mental model
start from applications and reason backward to the system underneath
or jump straight to the layer you are building now

If you are building an agent system, Chapter 7 will probably help you more than Chapter 1. But when the system breaks, you will eventually come back to the earlier chapters. The purpose of this book is not to give you answers. It is to make sure that when you face a complex system, you know what questions to ask.

Part I: Foundations of Large Language Model Systems

1. Foundations

Part II: Large Language Model Training

Closing

Thank you for reading.

The point of understanding AI is not merely to keep up. It is to catch the wave before it knocks you off your feet.

I hope that when the next wave comes, you are not standing outside watching it. I hope you are already in it, building something real.

A New Paradigm

Why Read This Book

What This Book Tries to Answer

Two Writing Principles

1. Questions Matter More Than Answers

2. Diagrams Matter More Than Text

Why There Are Interview Questions

How to Read This Book

Contents

Part I: Foundations of Large Language Model Systems

Part II: Large Language Model Training

Part III: Model Architecture

Part IV: Inference Systems

Part V: AI Applications

Appendix

Closing