Token Embeddings — what they are, why they matter, and how to build them (with working code)

Introduction Token embeddings (aka vector embeddings) turn tokens — words, subwords, or characters — into numeric vectors that encode meaning. They’re the essential bridge between raw text and a neural network. In this post, below we will run a small demos (Word2Vec-style analogies, similarity checks), and provide concrete PyTorch code that demonstrates how an embedding layer works, I also include a tiny toy training loop so you see embeddings updated by backprop. ...

September 28, 2025 · 7 min · Nitin

Unveiling the Secrets Behind ChatGPT – Part 2

For part 1 refer to this: Unveiling the Secrets Behind ChatGPT – Part 1 (learncodecamp.net) Implementing a Bigram Language Model When diving into the world of natural language processing (NLP) and language modeling, starting with a simple baseline model is essential. It helps establish a foundation to build upon. One of the simplest and most intuitive models for language generation is the bigram language model. This blog post will walk you through the implementation of a bigram language model using PyTorch, explaining the key concepts, steps, and code snippets along the way. ...

June 17, 2024 · 6 min · Nitin

Exploring the Power of Vector Databases: Leveraging KNN and HNSW for Efficient Data Retrieval

What are vector databases? A Vector Database is a type of database that stores information in a structured way using vectors. Now, what are vectors? Think of them as mathematical representations of data that capture its meaning and context. Let’s say you have a photo of a cat. Instead of just storing the image file, a Vector Database will convert this photo into a vector, which is essentially a set of numbers that represent various features of the cat, like its color, shape, and size. This vector will contain information about the cat in a way that a computer can understand. ...

March 6, 2024 · 6 min · Nitin

Understanding Embeddings

Introduction Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Whether it’s natural language processing, computer vision, recommender systems, or other applications, embeddings play a crucial role in enhancing model performance and scalability. Text embeddings measure the relatedness of text strings. Embeddings are commonly used for: Search (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity) Recommendations (where items with related text strings are recommended) Anomaly detection (where outliers with little relatedness are identified) Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label) Embedding vector from a string ...

February 20, 2024 · 4 min · Nitin