LoRA fine-tuning cover illustrationA cover graphic showing a frozen pretrained matrix plus a small low-rank adapter update made of two trainable matrices.LoRA Fine-TuningLow-Rank Adaptation for LLMsFreeze the large pretrained weight. Learn a small structured updatethat changes the model just enough for the new task.W = W0 + (alpha / r)BAOne frozen path, one tiny trainable pathThe base model stays intact while adapters learn the task update.W0d_out x d_infrozenAr x d_inBd_out x rtrain only r(d_in + d_out) paramsWhy it matters1. Far fewer trainable parameters2. Much smaller optimizer state3. Easy task-specific adapter checkpoints

LoRA Fine-Tuning Explained: What It Is, Why It Works, and the Math Behind It

LoRA stands for Low-Rank Adaptation. It is one of the most useful ideas in modern LLM fine-tuning because it changes the question from: How do we update all of the model's weights? to: How do we learn a small update that is still expressive enough for the new task? That is the whole trick. Instead of fine-tuning every entry of a large weight matrix, LoRA keeps the original pretrained weight frozen and learns a low-rank correction on top of it. This makes training much cheaper in parameters, optimizer state, checkpoint size, and often VRAM. ...

April 5, 2026 · 11 min · Nitin
Universal Approximation Theorem cover illustrationA clean cover illustration with a shallow neural network on the left and a target curve with a close approximation on the right.Universal ApproximationSimple unitscompose intorich curvesshallow networksumtarget and approximationtargetapproximation

Universal Approximation Theorem Explained: Why Neural Networks Can Approximate Any Continuous Function

The Universal Approximation Theorem (UAT) gets quoted constantly, but it is usually described in a fuzzier way than it deserves. It does not say neural networks are magically good at every task. It does not say a shallow network is the most practical architecture. It does not say gradient descent will easily find the right weights. What it does say is still important: With a suitable nonlinear activation and enough hidden units, a feedforward network can approximate any continuous function on a bounded domain as closely as we want. ...

April 3, 2026 · 8 min · Nitin

Training Loop Explained: Batches, Epochs, Iterations, and Convergence

Once you understand neurons, activations, loss functions, and backpropagation, the next thing to understand is the training loop. This is the repetitive engine of deep learning. At a high level, training is boring in the best possible way. It is the same four steps repeated over and over: make a prediction measure the error compute gradients update the weights The interesting part is not the loop itself. The interesting part is how concepts like batch size, epoch, iteration, and convergence affect the behavior of that loop in practice. ...

April 3, 2026 · 7 min · Nitin
Decision guide for MSE vs cross-entropyA diagram showing when to use mean squared error for continuous targets and cross-entropy for class prediction.Pick the loss that matchesthe prediction targetMSE measures numeric distance.Cross-entropy measures probability on the correct class.What is the model trying to predict?Start with the target shape, then pick the loss.Continuous numberExamples: price, temperature, demand, sensor valueUse MSEWhy: you care about distance fromthe true value.Class or probabilitydistributionExamples: spam, cat vs dog, next token, image labelUse cross-entropyWhy: you care about probability onthe correct class.Rule of thumb: numbers -> MSE, classes -> cross-entropy

MSE vs Cross-Entropy: Which Loss Function Should You Use?

Loss functions answer one basic question: How wrong is the model right now? Without a loss function, a neural network has no way to measure its own mistakes, and without that measurement, gradient-based training has nothing to optimize. Two of the most important losses in machine learning are: Mean Squared Error (MSE) Cross-Entropy They are both common. They are both differentiable. But they solve different kinds of problems, and using the wrong one makes training harder than it needs to be. ...

April 3, 2026 · 7 min · Nitin

Multi-Layer Perceptron Explained: Dense Networks from First Principles

A multi-layer perceptron (MLP) is one of the simplest and most important neural network architectures. It is not flashy. It is not state of the art for language or vision by itself. But if you do not understand MLPs, a lot of modern deep learning stays blurry. MLPs teach the core structure of neural networks: inputs become vectors layers apply learned linear transforms activations add nonlinearity deeper layers build more useful internal representations They also still matter in practice. Even transformers contain MLP blocks. Recommendation systems, tabular models, and many small classifiers still use dense networks directly. ...

April 3, 2026 · 7 min · Nitin
Forward pass cover illustrationA visual summary of the forward pass from inputs through weighted sum and activation to output, then scaling to matrix form.Forward PassFrom One Neuron to Matrix FormStart with one weighted sum, add a bias, apply an activation,then scale the same idea into a full dense layer.x1x2x3sum+ bReLUa = f(z)yx1w1x2w2x3w3Matrix formz = Wx + ba = f(z)many neuronsThe same computation repeats: inputs -> weighted sum -> activation -> output

Forward Pass Explained: From a Single Neuron to Matrix Form

The forward pass is the part of a neural network that actually produces a prediction. You feed inputs into the model, the model applies a sequence of mathematical operations, and an output comes out the other side. That sounds trivial, but it is one of the most important ideas in deep learning because everything else depends on it: the loss compares the forward-pass output to the target backpropagation differentiates through the forward pass training is just repeating the forward pass and improving it The easiest way to understand the forward pass is to start with a single neuron and then scale it up into a full layer written in matrix form. ...

April 3, 2026 · 7 min · Nitin
Backpropagation cover illustrationA cover image showing the forward pass flowing left to right and gradients flowing backward from loss to parameters.BackpropagationHow Neural Networks LearnForward pass computes values. Backward pass computes how mucheach earlier choice contributed to the final error.xwbzweighted sumReLUa = f(z)Lforward passbackward gradientsBackpropagation is the chain rule applied efficiently over the computation graph.

Backpropagation Explained Visually: How Neural Networks Actually Learn

Backpropagation is the core algorithm that makes neural networks trainable. The forward pass tells the model what prediction it currently makes. Backpropagation tells the model how each weight contributed to the error so the optimizer can update those weights in the right direction. People often hear that backpropagation is “just the chain rule,” which is true but not especially helpful. The useful mental model is this: the forward pass computes values the backward pass computes sensitivities each node only needs its own local derivative the full gradient is built by multiplying those local derivatives along the path If that sounds abstract, it becomes much clearer once you look at one neuron first and then scale up. ...

April 3, 2026 · 8 min · Nitin