TF-IDF

Introduction TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It combines two metrics: Term Frequency (TF) and Inverse Document Frequency (IDF). The TF-IDF value increases proportionally with the number of times a word appears in the document and is offset by the frequency of the word in the corpus. Components of TF-IDF Term Frequency (TF): Measures how frequently a term appears in a document. It’s calculated as: ...

November 10, 2024 · 5 min · Nitin

Unveiling the Secrets Behind ChatGPT – Part 2

For part 1 refer to this: Unveiling the Secrets Behind ChatGPT – Part 1 (learncodecamp.net) Implementing a Bigram Language Model When diving into the world of natural language processing (NLP) and language modeling, starting with a simple baseline model is essential. It helps establish a foundation to build upon. One of the simplest and most intuitive models for language generation is the bigram language model. This blog post will walk you through the implementation of a bigram language model using PyTorch, explaining the key concepts, steps, and code snippets along the way. ...

June 17, 2024 · 6 min · Nitin

Learning from Introduction to Deep Learning

Introduction Into to deep learning Intelligence: The ability to process information and use it for future decision-making. Artificial Intelligence (AI): Empowering computers with the ability to process information and make decisions. Machine Learning (ML): A subset of AI focused on teaching computers to learn from data. Deep Learning (DL): A subset of ML utilizing neural networks to process raw data and inform decisions. Why Deep Learning Now? The recent surge in deep learning’s capabilities can be attributed to three key factors: ...

May 4, 2024 · 7 min · Nitin

Understanding Embeddings

Introduction Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Whether it’s natural language processing, computer vision, recommender systems, or other applications, embeddings play a crucial role in enhancing model performance and scalability. Text embeddings measure the relatedness of text strings. Embeddings are commonly used for: Search (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity) Recommendations (where items with related text strings are recommended) Anomaly detection (where outliers with little relatedness are identified) Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label) Embedding vector from a string ...

February 20, 2024 · 4 min · Nitin