Universal Approximation Theorem cover illustrationA clean cover illustration with a shallow neural network on the left and a target curve with a close approximation on the right.Universal ApproximationSimple unitscompose intorich curvesshallow networksumtarget and approximationtargetapproximation

Universal Approximation Theorem Explained: Why Neural Networks Can Approximate Any Continuous Function

The Universal Approximation Theorem (UAT) gets quoted constantly, but it is usually described in a fuzzier way than it deserves. It does not say neural networks are magically good at every task. It does not say a shallow network is the most practical architecture. It does not say gradient descent will easily find the right weights. What it does say is still important: With a suitable nonlinear activation and enough hidden units, a feedforward network can approximate any continuous function on a bounded domain as closely as we want. ...

April 3, 2026 · 8 min · Nitin

Training Loop Explained: Batches, Epochs, Iterations, and Convergence

Once you understand neurons, activations, loss functions, and backpropagation, the next thing to understand is the training loop. This is the repetitive engine of deep learning. At a high level, training is boring in the best possible way. It is the same four steps repeated over and over: make a prediction measure the error compute gradients update the weights The interesting part is not the loop itself. The interesting part is how concepts like batch size, epoch, iteration, and convergence affect the behavior of that loop in practice. ...

April 3, 2026 · 7 min · Nitin
Decision guide for MSE vs cross-entropyA diagram showing when to use mean squared error for continuous targets and cross-entropy for class prediction.Pick the loss that matchesthe prediction targetMSE measures numeric distance.Cross-entropy measures probability on the correct class.What is the model trying to predict?Start with the target shape, then pick the loss.Continuous numberExamples: price, temperature, demand, sensor valueUse MSEWhy: you care about distance fromthe true value.Class or probabilitydistributionExamples: spam, cat vs dog, next token, image labelUse cross-entropyWhy: you care about probability onthe correct class.Rule of thumb: numbers -> MSE, classes -> cross-entropy

MSE vs Cross-Entropy: Which Loss Function Should You Use?

Loss functions answer one basic question: How wrong is the model right now? Without a loss function, a neural network has no way to measure its own mistakes, and without that measurement, gradient-based training has nothing to optimize. Two of the most important losses in machine learning are: Mean Squared Error (MSE) Cross-Entropy They are both common. They are both differentiable. But they solve different kinds of problems, and using the wrong one makes training harder than it needs to be. ...

April 3, 2026 · 7 min · Nitin

Multi-Layer Perceptron Explained: Dense Networks from First Principles

A multi-layer perceptron (MLP) is one of the simplest and most important neural network architectures. It is not flashy. It is not state of the art for language or vision by itself. But if you do not understand MLPs, a lot of modern deep learning stays blurry. MLPs teach the core structure of neural networks: inputs become vectors layers apply learned linear transforms activations add nonlinearity deeper layers build more useful internal representations They also still matter in practice. Even transformers contain MLP blocks. Recommendation systems, tabular models, and many small classifiers still use dense networks directly. ...

April 3, 2026 · 7 min · Nitin
Forward pass cover illustrationA visual summary of the forward pass from inputs through weighted sum and activation to output, then scaling to matrix form.Forward PassFrom One Neuron to Matrix FormStart with one weighted sum, add a bias, apply an activation,then scale the same idea into a full dense layer.x1x2x3sum+ bReLUa = f(z)yx1w1x2w2x3w3Matrix formz = Wx + ba = f(z)many neuronsThe same computation repeats: inputs -> weighted sum -> activation -> output

Forward Pass Explained: From a Single Neuron to Matrix Form

The forward pass is the part of a neural network that actually produces a prediction. You feed inputs into the model, the model applies a sequence of mathematical operations, and an output comes out the other side. That sounds trivial, but it is one of the most important ideas in deep learning because everything else depends on it: the loss compares the forward-pass output to the target backpropagation differentiates through the forward pass training is just repeating the forward pass and improving it The easiest way to understand the forward pass is to start with a single neuron and then scale it up into a full layer written in matrix form. ...

April 3, 2026 · 7 min · Nitin
Backpropagation cover illustrationA cover image showing the forward pass flowing left to right and gradients flowing backward from loss to parameters.BackpropagationHow Neural Networks LearnForward pass computes values. Backward pass computes how mucheach earlier choice contributed to the final error.xwbzweighted sumReLUa = f(z)Lforward passbackward gradientsBackpropagation is the chain rule applied efficiently over the computation graph.

Backpropagation Explained Visually: How Neural Networks Actually Learn

Backpropagation is the core algorithm that makes neural networks trainable. The forward pass tells the model what prediction it currently makes. Backpropagation tells the model how each weight contributed to the error so the optimizer can update those weights in the right direction. People often hear that backpropagation is “just the chain rule,” which is true but not especially helpful. The useful mental model is this: the forward pass computes values the backward pass computes sensitivities each node only needs its own local derivative the full gradient is built by multiplying those local derivatives along the path If that sounds abstract, it becomes much clearer once you look at one neuron first and then scale up. ...

April 3, 2026 · 8 min · Nitin
GPT-2 XL architecture diagram showing token embeddings, positional embeddings, 48 transformer blocks, 25 attention heads, and the output layer

Understanding LLM Architecture: Layers, Transformer Blocks, and Attention Heads

Large Language Models (LLMs) such as GPT-2, GPT-3, LLaMA, and BERT are built on top of the Transformer architecture. That architecture changed natural language processing by replacing recurrence with attention, which lets models process sequences more efficiently and capture long-range relationships more directly. If you are trying to understand what terms like layer, transformer block, and attention head actually mean, the easiest way is to follow the path a sentence takes through a GPT-style model. ...

March 16, 2026 · 8 min · Nitin

Loss functions for llm — a practical, hands-on guide

Introduction When training large language models (LLMs) the most important question is simple: how do we measure whether the model is doing well? For regression you use mean squared error, for classification you might use cross-entropy or hinge loss. But for LLMs — which predict sequences of discrete tokens — the right way to turn “this output feels wrong” into a number you can optimize is a specific kind of probability loss: categorical cross-entropy / negative log likelihood, and the closely related, more interpretable metric perplexity. ...

October 18, 2025 · 5 min · Nitin

ComfyUI API Endpoints Guide: Complete Reference for Image Generation Workflows

Introduction ComfyUI is a powerful, open-source, node-based interface for generative AI workflows, majorly for image and video workflows. While it’s primarily known for its visual interface, ComfyUI also offers robust API capabilities, enabling developers to integrate and automate workflows programmatically. This guide will walk you through using ComfyUI in API mode. ComfyUI offers a suite of RESTful and WebSocket API endpoints that enable developers to programmatically interact with its workflow engine. These endpoints facilitate tasks such as queuing prompts, retrieving results, uploading images, and monitoring system status. ...

June 1, 2025 · 3 min · Nitin

Kokoro: High-Quality Text-to-Speech(tts) on Your CPU with ONNX

This sound is generated with Kokoro tts The world of text-to-speech (TTS) has seen incredible advancements, but often these powerful models require hefty hardware like GPUs. But what if you could run a top-tier TTS model locally on your CPU? Enter **Kokoro**, a game-changing TTS model that delivers impressive results even on resource-constrained devices. Kokoro: Small but Mighty Kokoro stands out for its remarkable efficiency. With just 82 million parameters, it outperforms models several times its size, including XTTS (467M parameters) and MetaVoice (1.2B parameters). This proves that cutting-edge TTS is achievable without relying on massive models and powerful GPUs. ...

January 12, 2025 · 3 min · Nitin