AI Engineering Handbook
By Florence J. and Rachel L.
Something is changing. Quietly.
Software used to be deterministic. Given an input, you got an output. A program was a tightly controlled path: logic, branches, state machines, interface contracts. The engineer’s job was to compress the world into forms code could describe exactly.
That model is starting to fail.
More and more systems no longer compute answers directly. They generate answers. They no longer execute rules so much as make decisions under uncertainty that merely seem reasonable.
So the nature of software is changing.
A New Paradigm
Today’s systems often look like this:
- a model that generates and understands
- a retrieval system that provides external memory
- a set of tools that connects the system to the real world
- an evaluation layer that decides whether the result is acceptable
Code no longer solves the problem directly. Code now orchestrates a system that can think. That sounds abstract, but engineers quickly realize something:
What you are writing is no longer a program. It is the set of conditions under which a system behaves.
This system may make mistakes. It may hesitate. It may drift off target. You cannot control it completely, but you can constrain it, steer it, and correct it. That is a new engineering paradigm.
We call it AI systems engineering.
Why Read This Book
There is already an enormous amount of material about AI—papers, blogs, open-source projects, tutorials. Nearly everything exists. But most of it answers local questions:
- How Transformers work
- How RLHF is trained
- How to build RAG
- How to speed up inference
All of that matters. But one question is rarely answered systematically:
How do these pieces fit together into a complete system?
In the real world, an AI system is never a single technique. It is always some combination of:
- training (pretraining, mid-training, alignment)
- models (architecture choices, capability distribution)
- inference (latency, KV cache, system optimization)
- applications (RAG, tools, agents)
- evaluation and feedback loops
More importantly, these parts are tightly coupled.
- The way you train changes behavior at inference time
- Model architecture affects system latency
- Retrieval design changes output quality
- Tool interfaces determine whether the system is reliable
If you understand only one part, you will have a hard time understanding the system itself.
What This Book Tries to Answer
This book revolves around one question:
How are modern AI systems built?
To answer it, we break the system into five layers: Foundations → Training → Models → Inference → Applications
This is not a taxonomy. It is a way of seeing. You can start with the Transformer and follow the path all the way to agent systems. Or you can start with an application problem and reason backward toward the training and model choices underneath it. Once the links between these layers become clear, AI engineering stops looking like a pile of jargon and starts looking like a system you can actually understand.
Two Writing Principles
This book follows two simple rules.
1. Questions Matter More Than Answers
When models can generate answers easily, the scarce thing is the question. A good question is often worth more than a complicated answer. We care more about:
- Why does KV cache reduce complexity?
- Why does LoRA work?
- Why can RAG sometimes make a system worse?
- Why does an agent sometimes get dumber the more you use it?
These questions are the entry points to understanding the system.
2. Diagrams Matter More Than Text
When text can be generated without limit, text itself is no longer evidence. Structure is.
A good diagram is often closer to the truth than a page of prose. Because it exposes the constraints, paths, and tradeoffs of a system, rather than just describing its surface.
That is why this book contains so many structural diagrams. They are not decoration. They are part of the reasoning.
Why There Are Interview Questions
There is a very common illusion:
“I already understand this.”
Then someone asks:
“Then explain why.”
And the understanding collapses.
So this book includes a large number of questions drawn from real AI engineering interviews. Not for the sake of testing. But so you can check one thing:
Do you actually understand it, or do you just recognize the vocabulary?
When you can answer these questions clearly, then you truly understand the system.
How to Read This Book
This is not a book you have to read from front to back.
You can:
- start from the foundations and build a complete mental model
- start from applications and reason backward to the system underneath
- or jump straight to the layer you are building now
If you are building an agent system, Chapter 7 will probably help you more than Chapter 1. But when the system breaks, you will eventually come back to the earlier chapters. The purpose of this book is not to give you answers. It is to make sure that when you face a complex system, you know what questions to ask.
Contents
Part I: Foundations of Large Language Model Systems
Part II: Large Language Model Training
Part III: Model Architecture
Part IV: Inference Systems
Part V: AI Applications
Appendix
Closing
Thank you for reading.
The point of understanding AI is not merely to keep up. It is to catch the wave before it knocks you off your feet.
I hope that when the next wave comes, you are not standing outside watching it. I hope you are already in it, building something real.