Understanding LLMs

1
Intro to Large Language Models
The Busy Person’s Guide to Large Language Models: From Inner Workings to Future Possibilities (and Security Concerns) …
4 min read · Apr 2024
2
Understanding Tokenization in Large Language Models: A Deep Dive – Part 1
Tokenization is a fundamental yet often misunderstood process in the realm of large language models (LLMs). Despite its …
6 min read · Aug 2024
3
Tokenization
Natural Language Processing (NLP) has revolutionized the way machines understand human language. But before models can …
3 min read · Apr 2025
4
Byte Pair Encoding (BPE): the tokenizer that made GPTs practical
Introduction Byte Pair Encoding (BPE) is a subword tokenization scheme that gives us the best of both worlds: compact …
4 min read · Sep 2025
5
Token Embeddings — what they are, why they matter, and how to build them (with working code)
Introduction Token embeddings (aka vector embeddings) turn tokens — words, subwords, or characters — into numeric …
7 min read · Sep 2025
6
Q K V : Query (Q), Key (K), and Value (V) Vectors in the Attention Mechanism
Introduction In the attention mechanism used by Large Language Models (LLMs) like transformers (e.g., GPT), the core …
3 min read · Oct 2025
7
Attention Mechanisms Explained: Self-Attention, Cross-Attention, Sparse Attention, MQA, GQA, and DeepSeek MLA
A detailed guide to attention mechanisms in modern AI, from Bahdanau attention and Transformers to local attention, sparse attention, linear attention, multi-query attention, grouped-query attention, and DeepSeek's multi-head latent attention.
14 min read · Mar 2026
8
Understanding LLM Architecture: Layers, Transformer Blocks, and Attention Heads
A practical guide to the internal architecture of large language models, including embeddings, transformer blocks, self-attention, attention heads, MLP layers, residual connections, and execution parallelism.
8 min read · Mar 2026
9
RoPE Explained: The Positional Encoding Trick Behind Modern Language Models
A practical guide to Rotary Positional Embedding (RoPE), including why transformers need positional information, how RoPE rotates queries and keys, and why it became a standard choice in modern LLMs.
10 min read · Mar 2026