TTFT in LLMs Explained: What Time to First Token Really Measures

When I evaluate an LLM system, one of the first latency metrics I look at is TTFT, or time to first token. This metric answers a simple question: After a user sends a request, how long does it take before the first output token appears? That sounds narrow, but it matters a lot. Users usually forgive a response that streams steadily after it starts. What feels bad is the dead time before anything appears on screen. ...

April 17, 2026 · 7 min · Nitin

Claude Code Tools Explained: What Each Tool Does and When to Use It

When I use Claude Code, I am not just using a model that generates text. I am using a tool-driven coding environment that can inspect files, search code, edit content, run shell commands, and delegate work to subagents. That tool layer is the real reason Claude Code feels different to me from a normal chat UI. Instead of asking: Can the model explain my code? I can ask: Can the model inspect the repo, find the bug, patch the file, and run the command needed to verify the fix? ...

April 16, 2026 · 10 min · Nitin
LoRA fine-tuning cover illustrationA cover graphic showing a frozen pretrained matrix plus a small low-rank adapter update made of two trainable matrices.LoRA Fine-TuningLow-Rank Adaptation for LLMsFreeze the large pretrained weight. Learn a small structured updatethat changes the model just enough for the new task.W = W0 + (alpha / r)BAOne frozen path, one tiny trainable pathThe base model stays intact while adapters learn the task update.W0d_out x d_infrozenAr x d_inBd_out x rtrain only r(d_in + d_out) paramsWhy it matters1. Far fewer trainable parameters2. Much smaller optimizer state3. Easy task-specific adapter checkpoints

LoRA Fine-Tuning Explained: What It Is, Why It Works, and the Math Behind It

LoRA stands for Low-Rank Adaptation. It is one of the most useful ideas in modern LLM fine-tuning because it changes the question from: How do we update all of the model's weights? to: How do we learn a small update that is still expressive enough for the new task? That is the whole trick. Instead of fine-tuning every entry of a large weight matrix, LoRA keeps the original pretrained weight frozen and learns a low-rank correction on top of it. This makes training much cheaper in parameters, optimizer state, checkpoint size, and often VRAM. ...

April 5, 2026 · 11 min · Nitin