6. Training & Inference

Overview

Notes on efficient training and inference: batching, KV cache, quantization, distillation, serving patterns.

flash-attention

page-attention

training framework

inference framework