Training Loop Explained: Batches, Epochs, Iterations, and Convergence

Once you understand neurons, activations, loss functions, and backpropagation, the next thing to understand is the training loop. This is the repetitive engine of deep learning. At a high level, training is boring in the best possible way. It is the same four steps repeated over and over: make a prediction measure the error compute gradients update the weights The interesting part is not the loop itself. The interesting part is how concepts like batch size, epoch, iteration, and convergence affect the behavior of that loop in practice. ...

April 3, 2026 · 7 min · Nitin
Decision guide for MSE vs cross-entropyA diagram showing when to use mean squared error for continuous targets and cross-entropy for class prediction.Pick the loss that matchesthe prediction targetMSE measures numeric distance.Cross-entropy measures probability on the correct class.What is the model trying to predict?Start with the target shape, then pick the loss.Continuous numberExamples: price, temperature, demand, sensor valueUse MSEWhy: you care about distance fromthe true value.Class or probabilitydistributionExamples: spam, cat vs dog, next token, image labelUse cross-entropyWhy: you care about probability onthe correct class.Rule of thumb: numbers -> MSE, classes -> cross-entropy

MSE vs Cross-Entropy: Which Loss Function Should You Use?

Loss functions answer one basic question: How wrong is the model right now? Without a loss function, a neural network has no way to measure its own mistakes, and without that measurement, gradient-based training has nothing to optimize. Two of the most important losses in machine learning are: Mean Squared Error (MSE) Cross-Entropy They are both common. They are both differentiable. But they solve different kinds of problems, and using the wrong one makes training harder than it needs to be. ...

April 3, 2026 · 7 min · Nitin

Multi-Layer Perceptron Explained: Dense Networks from First Principles

A multi-layer perceptron (MLP) is one of the simplest and most important neural network architectures. It is not flashy. It is not state of the art for language or vision by itself. But if you do not understand MLPs, a lot of modern deep learning stays blurry. MLPs teach the core structure of neural networks: inputs become vectors layers apply learned linear transforms activations add nonlinearity deeper layers build more useful internal representations They also still matter in practice. Even transformers contain MLP blocks. Recommendation systems, tabular models, and many small classifiers still use dense networks directly. ...

April 3, 2026 · 7 min · Nitin
Forward pass cover illustrationA visual summary of the forward pass from inputs through weighted sum and activation to output, then scaling to matrix form.Forward PassFrom One Neuron to Matrix FormStart with one weighted sum, add a bias, apply an activation,then scale the same idea into a full dense layer.x1x2x3sum+ bReLUa = f(z)yx1w1x2w2x3w3Matrix formz = Wx + ba = f(z)many neuronsThe same computation repeats: inputs -> weighted sum -> activation -> output

Forward Pass Explained: From a Single Neuron to Matrix Form

The forward pass is the part of a neural network that actually produces a prediction. You feed inputs into the model, the model applies a sequence of mathematical operations, and an output comes out the other side. That sounds trivial, but it is one of the most important ideas in deep learning because everything else depends on it: the loss compares the forward-pass output to the target backpropagation differentiates through the forward pass training is just repeating the forward pass and improving it The easiest way to understand the forward pass is to start with a single neuron and then scale it up into a full layer written in matrix form. ...

April 3, 2026 · 7 min · Nitin
Backpropagation cover illustrationA cover image showing the forward pass flowing left to right and gradients flowing backward from loss to parameters.BackpropagationHow Neural Networks LearnForward pass computes values. Backward pass computes how mucheach earlier choice contributed to the final error.xwbzweighted sumReLUa = f(z)Lforward passbackward gradientsBackpropagation is the chain rule applied efficiently over the computation graph.

Backpropagation Explained Visually: How Neural Networks Actually Learn

Backpropagation is the core algorithm that makes neural networks trainable. The forward pass tells the model what prediction it currently makes. Backpropagation tells the model how each weight contributed to the error so the optimizer can update those weights in the right direction. People often hear that backpropagation is “just the chain rule,” which is true but not especially helpful. The useful mental model is this: the forward pass computes values the backward pass computes sensitivities each node only needs its own local derivative the full gradient is built by multiplying those local derivatives along the path If that sounds abstract, it becomes much clearer once you look at one neuron first and then scale up. ...

April 3, 2026 · 8 min · Nitin

Source Maps Explained: How They Work and Why They Sometimes Leak Source Code

Most developers only think about source maps when DevTools magically shows the original TypeScript instead of unreadable bundled JavaScript. That convenience hides an important fact: A source map is not just “debug metadata.” It is a translation table between generated code and original source code. And depending on how it is emitted, it can contain the original source itself. That is why source maps sit at the intersection of: debugging build tooling browser DevTools error reporting systems like Sentry security and accidental code exposure If you have ever wondered how a minified file can still produce readable stack traces, or how a published .map file can expose a package’s real TypeScript source, this is the mental model you want. ...

April 2, 2026 · 10 min · Nitin

How Adblock Extensions Work and How to Customize Their Behavior

When people think about adblock extensions, they usually imagine something simple: “The extension sees an ad and hides it.” That is only part of the story. Tools like uBlock Origin are better understood as content blockers, not just ad blockers. They do block ads, but they also block: trackers popups malware domains anti-blocker scripts other unwanted page behavior Modern blockers such as uBlock Origin mostly work by applying rules to: ...

March 29, 2026 · 9 min · Nitin

RoPE Explained: The Positional Encoding Trick Behind Modern Language Models

When people talk about transformers, they usually focus on attention, scale, or training data. But one smaller design choice has an outsized effect on model quality: How does the model know where each token appears in the sequence? That question matters because transformers do not understand order by default. Without positional information, a sequence starts to look more like an unordered set of tokens than a structured sentence, paragraph, or program. ...

March 19, 2026 · 10 min · Nitin
GPT-2 XL architecture diagram showing token embeddings, positional embeddings, 48 transformer blocks, 25 attention heads, and the output layer

Understanding LLM Architecture: Layers, Transformer Blocks, and Attention Heads

Large Language Models (LLMs) such as GPT-2, GPT-3, LLaMA, and BERT are built on top of the Transformer architecture. That architecture changed natural language processing by replacing recurrence with attention, which lets models process sequences more efficiently and capture long-range relationships more directly. If you are trying to understand what terms like layer, transformer block, and attention head actually mean, the easiest way is to follow the path a sentence takes through a GPT-style model. ...

March 16, 2026 · 8 min · Nitin

How Much Do LLMs Hallucinate in Document Q&A? Key Lessons from a 172B-Token Study

If you are building a RAG system, internal knowledge assistant, or document search chatbot, one question matters more than almost anything else: When the answer is supposed to come from the provided documents, how often does the model still make things up? That is exactly what the March 9, 2026 paper “How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms” tries to measure. ...

March 13, 2026 · 9 min · Nitin