Fit Your LLM on a single GPU with Gradient Checkpointing, LoRA, and Quantization: a deep dive
Published in
14 min readAug 3, 2023
Whoever has ever tried to fine-tune a Large Language Model knows how hard it is to handle the GPU memory.
“RuntimeError: CUDA error: out of memory”.
This error message has been haunting my nights.
3B, 7B, or even 13B parameters models are large and the fine-tuning is long and tedious. Running out of memory during training can be both frustrating and costly.