The Busy Person’s Guide to Large Language Models: From Inner Workings to Future Possibilities (and Security Concerns) …
Understanding LLMs
- 1Intro to Large Language Models
- 2Understanding Tokenization in Large Language Models: A Deep Dive – Part 1
Tokenization is a fundamental yet often misunderstood process in the realm of large language models (LLMs). Despite its …
- 3Tokenization
Natural Language Processing (NLP) has revolutionized the way machines understand human language. But before models can …
- 4Byte Pair Encoding (BPE): the tokenizer that made GPTs practical
Introduction Byte Pair Encoding (BPE) is a subword tokenization scheme that gives us the best of both worlds: compact …
- 5Token Embeddings — what they are, why they matter, and how to build them (with working code)
Introduction Token embeddings (aka vector embeddings) turn tokens — words, subwords, or characters — into numeric …
- 6Q K V : Query (Q), Key (K), and Value (V) Vectors in the Attention Mechanism
Introduction In the attention mechanism used by Large Language Models (LLMs) like transformers (e.g., GPT), the core …
- 7Attention Mechanisms Explained: Self-Attention, Cross-Attention, Sparse Attention, MQA, GQA, and DeepSeek MLA
A detailed guide to attention mechanisms in modern AI, from Bahdanau attention and Transformers to local attention, sparse attention, linear attention, multi-query attention, grouped-query attention, and DeepSeek's multi-head latent attention.
- 8Understanding LLM Architecture: Layers, Transformer Blocks, and Attention Heads
A practical guide to the internal architecture of large language models, including embeddings, transformer blocks, self-attention, attention heads, MLP layers, residual connections, and execution parallelism.
- 9RoPE Explained: The Positional Encoding Trick Behind Modern Language Models
A practical guide to Rotary Positional Embedding (RoPE), including why transformers need positional information, how RoPE rotates queries and keys, and why it became a standard choice in modern LLMs.