LLMs | Learn Code Camp

TTFT in LLMs Explained: What Time to First Token Really Measures

When I evaluate an LLM system, one of the first latency metrics I look at is TTFT, or time to first token. This metric answers a simple question: After a user sends a request, how long does it take before the first output token appears? That sounds narrow, but it matters a lot. Users usually forgive a response that streams steadily after it starts. What feels bad is the dead time before anything appears on screen. ...

Claude Code Tools Explained: What Each Tool Does and When to Use It

When I use Claude Code, I am not just using a model that generates text. I am using a tool-driven coding environment that can inspect files, search code, edit content, run shell commands, and delegate work to subagents. That tool layer is the real reason Claude Code feels different to me from a normal chat UI. Instead of asking: Can the model explain my code? I can ask: Can the model inspect the repo, find the bug, patch the file, and run the command needed to verify the fix? ...

LoRA Fine-Tuning Explained: What It Is, Why It Works, and the Math Behind It

LoRA stands for Low-Rank Adaptation. It is one of the most useful ideas in modern LLM fine-tuning because it changes the question from: How do we update all of the model's weights? to: How do we learn a small update that is still expressive enough for the new task? That is the whole trick. Instead of fine-tuning every entry of a large weight matrix, LoRA keeps the original pretrained weight frozen and learns a low-rank correction on top of it. This makes training much cheaper in parameters, optimizer state, checkpoint size, and often VRAM. ...