[{"content":"Paste JSON and use pretty/minify actions. Invalid input highlights the problem line.\n","permalink":"https://learncodecamp.net/tools/json-formatter-validator/","summary":"\u003cp\u003ePaste JSON and use pretty/minify actions. Invalid input highlights the problem line.\u003c/p\u003e","title":"JSON Formatter + Validator"},{"content":"Works with UTF-8 text and multi-language input.\n","permalink":"https://learncodecamp.net/tools/base64-encode-decode/","summary":"\u003cp\u003eWorks with UTF-8 text and multi-language input.\u003c/p\u003e","title":"Base64 Encode/Decode"},{"content":"Upload an image to generate Base64, or paste Base64 to preview and download the decoded image.\n","permalink":"https://learncodecamp.net/tools/image-base64-encode-decode/","summary":"\u003cp\u003eUpload an image to generate Base64, or paste Base64 to preview and download the decoded image.\u003c/p\u003e","title":"Image Base64 Encode/Decode"},{"content":"Useful for query params, path segments, and form payload debugging.\n","permalink":"https://learncodecamp.net/tools/url-encode-decode/","summary":"\u003cp\u003eUseful for query params, path segments, and form payload debugging.\u003c/p\u003e","title":"URL Encode/Decode"},{"content":"Generates RFC 4122-compliant UUID v4 values in-browser.\n","permalink":"https://learncodecamp.net/tools/uuid-generator-v4/","summary":"\u003cp\u003eGenerates RFC 4122-compliant UUID v4 values in-browser.\u003c/p\u003e","title":"UUID Generator (v4)"},{"content":"Convert epoch values to local/UTC time and convert local datetime back to epoch.\n","permalink":"https://learncodecamp.net/tools/unix-timestamp-converter/","summary":"\u003cp\u003eConvert epoch values to local/UTC time and convert local datetime back to epoch.\u003c/p\u003e","title":"Unix Timestamp Converter"},{"content":"Decode and inspect JWTs client-side, with optional HS256 signature verification.\n","permalink":"https://learncodecamp.net/tools/jwt-decoder/","summary":"\u003cp\u003eDecode and inspect JWTs client-side, with optional HS256 signature verification.\u003c/p\u003e","title":"JWT Decoder"},{"content":"Test pattern + flags quickly and preview replacement output.\n","permalink":"https://learncodecamp.net/tools/regex-tester/","summary":"\u003cp\u003eTest pattern + flags quickly and preview replacement output.\u003c/p\u003e","title":"Regex Tester"},{"content":"Best for code snippets, config files, and quick content comparisons.\n","permalink":"https://learncodecamp.net/tools/text-diff-checker/","summary":"\u003cp\u003eBest for code snippets, config files, and quick content comparisons.\u003c/p\u003e","title":"Text Diff Checker"},{"content":"Hashing is done in-browser. Use for checksums and quick verification workflows.\n","permalink":"https://learncodecamp.net/tools/hash-generator-sha-256-md5/","summary":"\u003cp\u003eHashing is done in-browser. Use for checksums and quick verification workflows.\u003c/p\u003e","title":"Hash Generator (SHA-256, MD5)"},{"content":"Preview Markdown client-side with split panes, synced scrolling, dark mode, and quick PDF export.\n","permalink":"https://learncodecamp.net/tools/markdown-live-preview/","summary":"\u003cp\u003ePreview Markdown client-side with split panes, synced scrolling, dark mode, and quick PDF export.\u003c/p\u003e","title":"Markdown Live Preview"},{"content":"Enter values for each field or pick a preset. Supports *, ranges (1-5), lists (1,3,5), and steps (*/15).\n","permalink":"https://learncodecamp.net/tools/cron-expression-builder/","summary":"\u003cp\u003eEnter values for each field or pick a preset. Supports \u003ccode\u003e*\u003c/code\u003e, ranges (\u003ccode\u003e1-5\u003c/code\u003e), lists (\u003ccode\u003e1,3,5\u003c/code\u003e), and steps (\u003ccode\u003e*/15\u003c/code\u003e).\u003c/p\u003e","title":"Cron Expression Builder"},{"content":"Supports nested objects, arrays, and optional Lombok @Data and Jackson @JsonProperty annotations for Java output.\n","permalink":"https://learncodecamp.net/tools/json-to-pojo-typescript/","summary":"\u003cp\u003eSupports nested objects, arrays, and optional Lombok \u003ccode\u003e@Data\u003c/code\u003e and Jackson \u003ccode\u003e@JsonProperty\u003c/code\u003e annotations for Java output.\u003c/p\u003e","title":"JSON → Java POJO / TypeScript"},{"content":"All conversions happen instantly in your browser. Click the swatch to open the native color picker.\n","permalink":"https://learncodecamp.net/tools/color-converter/","summary":"\u003cp\u003eAll conversions happen instantly in your browser. Click the swatch to open the native color picker.\u003c/p\u003e","title":"Color Picker \u0026 CSS Converter"},{"content":"Search by code number, name, or keyword. All 60+ standard status codes including WebDAV extensions.\n","permalink":"https://learncodecamp.net/tools/http-status-codes/","summary":"\u003cp\u003eSearch by code number, name, or keyword. All 60+ standard status codes including WebDAV extensions.\u003c/p\u003e","title":"HTTP Status Code Reference"},{"content":"Formats major SQL clauses onto their own lines, indents AND/OR conditions, and handles column lists. No data leaves your browser.\n","permalink":"https://learncodecamp.net/tools/sql-formatter/","summary":"\u003cp\u003eFormats major SQL clauses onto their own lines, indents \u003ccode\u003eAND\u003c/code\u003e/\u003ccode\u003eOR\u003c/code\u003e conditions, and handles column lists. No data leaves your browser.\u003c/p\u003e","title":"SQL Formatter"},{"content":"Every web developer uses Chrome DevTools.\nWe inspect elements, read console logs, watch network requests, throttle CPU, emulate mobile screens, record performance traces, check storage, debug JavaScript, and capture screenshots.\nMost of that feels like a browser UI.\nUnder the hood, there is a protocol.\nThat protocol is Chrome DevTools Protocol, usually shortened to CDP.\nCDP is the browser debugging API that lets tools instrument, inspect, debug, and profile Chrome, Chromium, and other Blink-based browsers. Chrome DevTools itself uses this protocol. Many automation and debugging tools also build on it directly or indirectly.\nIn this post, I explain CDP from a practical developer point of view: what it is, how it works, what domains are, how remote debugging exposes endpoints, how commands and events flow, and when I would use CDP directly instead of a higher-level tool.\nThe Short Version CDP is a JSON message protocol over WebSocket.\nA client connects to a debuggable browser or page target, sends commands, and receives responses and events.\nFor example:\n{ \u0026#34;id\u0026#34;: 1, \u0026#34;method\u0026#34;: \u0026#34;Page.enable\u0026#34; } Then:\n{ \u0026#34;id\u0026#34;: 2, \u0026#34;method\u0026#34;: \u0026#34;Page.navigate\u0026#34;, \u0026#34;params\u0026#34;: { \u0026#34;url\u0026#34;: \u0026#34;https://example.com\u0026#34; } } The browser responds to commands by matching the same id, and it also emits events:\n{ \u0026#34;method\u0026#34;: \u0026#34;Page.loadEventFired\u0026#34;, \u0026#34;params\u0026#34;: { \u0026#34;timestamp\u0026#34;: 12345.67 } } That is the core model:\nclient -\u0026gt; command -\u0026gt; browser client \u0026lt;- response \u0026lt;- browser client \u0026lt;- event \u0026lt;- browser Everything else is built around that idea.\nWhy CDP Exists Chrome DevTools needs a way to talk to the browser engine.\nWhen you open the Network panel, DevTools needs network events. When you set a breakpoint, DevTools needs debugger commands. When you inspect the DOM, DevTools needs DOM data. When you take a performance recording, DevTools needs tracing and performance data.\nCDP is the structured API for that communication.\nIt exposes browser capabilities through domains such as:\nPage Runtime Network DOM CSS Debugger Target Emulation Performance Tracing Storage Input Accessibility Each domain defines commands and events.\nFor example:\nDomain What it is commonly used for Page Navigation, lifecycle events, screenshots, frame information Runtime Evaluate JavaScript, inspect objects, observe execution contexts Network Observe requests, responses, headers, bodies, failures, WebSocket frames DOM Inspect and manipulate the document tree CSS Inspect stylesheets, rules, computed styles, and style changes Debugger Set breakpoints, pause, resume, step through JavaScript Target Discover and attach to tabs, workers, iframes, and browser targets Emulation Override viewport, device metrics, geolocation, CPU, media, timezone, and more Performance Collect metrics Tracing Capture detailed timeline traces Input Dispatch mouse, keyboard, and touch input Accessibility Inspect accessibility tree information The names are plain because CDP was built for tools, not for end-user ergonomics.\nCDP Is Lower Level Than Puppeteer Or Playwright If you have used Puppeteer, Playwright, Selenium, ChromeDriver, or Chrome DevTools MCP, you have probably benefited from CDP without writing CDP messages yourself.\nThe difference is level of abstraction.\nLayer What you work with CDP Raw browser domains, commands, events, sessions, targets Puppeteer Pages, locators, screenshots, browser contexts, high-level automation APIs Playwright Cross-browser automation, reliable locators, assertions, test runner Chrome DevTools MCP Agent-facing browser tools exposed through MCP Chrome DevTools UI Human-facing panels and workflows CDP is powerful because it is close to the browser. It is also verbose because it is close to the browser.\nFor routine testing, I would usually use Playwright.\nFor browser scripting in Node, I would usually use Puppeteer.\nFor agent-assisted debugging, I would use Chrome DevTools MCP.\nFor custom browser tooling, protocol experiments, deep debugging, or automation that needs exact browser events, I would consider CDP directly.\nHow Remote Debugging Works To talk CDP to Chrome, Chrome must expose a debugging endpoint.\nThe common local development flow is:\n/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome \\ --remote-debugging-port=9222 \\ --user-data-dir=/tmp/chrome-cdp-profile On Linux, it often looks like:\ngoogle-chrome \\ --remote-debugging-port=9222 \\ --user-data-dir=/tmp/chrome-cdp-profile The --remote-debugging-port=9222 flag starts an HTTP server on that port. The --user-data-dir flag keeps the debug session separate from your normal Chrome profile.\nOnce Chrome is running, these endpoints become useful:\nhttp://127.0.0.1:9222/json/version http://127.0.0.1:9222/json http://127.0.0.1:9222/json/list http://127.0.0.1:9222/json/protocol The official protocol documentation describes these endpoints:\nEndpoint Purpose /json/version Browser version metadata and the browser-level WebSocket URL /json or /json/list List available targets such as pages /json/protocol The protocol schema supported by the running browser /json/new?{url} Open a new tab /json/activate/{targetId} Bring a target tab to the foreground /json/close/{targetId} Close a target tab The important value is webSocketDebuggerUrl.\nIt looks like this:\n{ \u0026#34;webSocketDebuggerUrl\u0026#34;: \u0026#34;ws://127.0.0.1:9222/devtools/page/ABC123\u0026#34; } That WebSocket is where CDP messages flow.\nBrowser Targets And Page Targets CDP has a target model.\nA target can be a page, iframe, worker, service worker, shared worker, browser context, or the browser itself.\nThis matters because some commands belong at the page level, and some belong at the browser level.\nFor example:\nPage.navigate is page-oriented. Runtime.evaluate runs in an execution context. Network.enable subscribes to network events for a target. Browser.getVersion is browser-level. Target.getTargets discovers debuggable targets. When you call /json/list, you usually see page targets. Each page target has its own WebSocket URL.\nWhen you call /json/version, you can get the browser-level WebSocket URL. The official docs call out that the browser target URL contains browser rather than page.\nThat distinction is important when building tools that manage multiple tabs or browser contexts.\nCommands, Responses, And Events CDP messages are JSON objects.\nA command has:\nid method optional params Example:\n{ \u0026#34;id\u0026#34;: 10, \u0026#34;method\u0026#34;: \u0026#34;Runtime.evaluate\u0026#34;, \u0026#34;params\u0026#34;: { \u0026#34;expression\u0026#34;: \u0026#34;document.title\u0026#34; } } A response has the same id:\n{ \u0026#34;id\u0026#34;: 10, \u0026#34;result\u0026#34;: { \u0026#34;result\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;, \u0026#34;value\u0026#34;: \u0026#34;Example Domain\u0026#34; } } } An event has a method, but no command id:\n{ \u0026#34;method\u0026#34;: \u0026#34;Network.requestWillBeSent\u0026#34;, \u0026#34;params\u0026#34;: { \u0026#34;requestId\u0026#34;: \u0026#34;1234.1\u0026#34;, \u0026#34;request\u0026#34;: { \u0026#34;url\u0026#34;: \u0026#34;https://example.com/\u0026#34; } } } This distinction is simple but important:\nresponses answer commands events describe things happening in the browser Many workflows require both.\nFor example, a network logger first sends:\n{ \u0026#34;id\u0026#34;: 1, \u0026#34;method\u0026#34;: \u0026#34;Network.enable\u0026#34; } After that, the browser starts emitting Network.requestWillBeSent, Network.responseReceived, Network.loadingFinished, and Network.loadingFailed events.\nYou do not poll for each request. You subscribe to a domain and listen.\nA Minimal CDP Session Here is the shape of a minimal manual CDP session.\nStart Chrome:\n/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome \\ --remote-debugging-port=9222 \\ --user-data-dir=/tmp/chrome-cdp-profile List targets:\ncurl http://127.0.0.1:9222/json/list Pick the webSocketDebuggerUrl for the page.\nThen connect with a WebSocket client and send:\n{\u0026#34;id\u0026#34;:1,\u0026#34;method\u0026#34;:\u0026#34;Page.enable\u0026#34;} Navigate:\n{\u0026#34;id\u0026#34;:2,\u0026#34;method\u0026#34;:\u0026#34;Page.navigate\u0026#34;,\u0026#34;params\u0026#34;:{\u0026#34;url\u0026#34;:\u0026#34;https://example.com\u0026#34;}} Evaluate JavaScript:\n{\u0026#34;id\u0026#34;:3,\u0026#34;method\u0026#34;:\u0026#34;Runtime.evaluate\u0026#34;,\u0026#34;params\u0026#34;:{\u0026#34;expression\u0026#34;:\u0026#34;document.title\u0026#34;}} Capture a screenshot:\n{\u0026#34;id\u0026#34;:4,\u0026#34;method\u0026#34;:\u0026#34;Page.captureScreenshot\u0026#34;,\u0026#34;params\u0026#34;:{\u0026#34;format\u0026#34;:\u0026#34;png\u0026#34;}} The screenshot response contains base64 image data. Higher-level tools usually hide that detail and write the image to a file for you.\nThat is one reason I rarely want raw CDP for everyday work. CDP is exact, but you have to handle the plumbing.\nProtocol Domains I Use Most Often Page The Page domain covers navigation, screenshots, lifecycle events, frames, dialogs, and page-level state.\nCommon commands include:\nPage.enable Page.navigate Page.reload Page.captureScreenshot Page.printToPDF Page.addScriptToEvaluateOnNewDocument Useful events include:\nPage.domContentEventFired Page.loadEventFired Page.frameNavigated Page.javascriptDialogOpening When I think \u0026ldquo;tab-level browser behavior\u0026rdquo;, I usually look at Page.\nRuntime The Runtime domain lets a tool evaluate JavaScript and inspect values.\nCommon commands include:\nRuntime.enable Runtime.evaluate Runtime.callFunctionOn Runtime.getProperties Runtime.releaseObject Useful events include:\nRuntime.consoleAPICalled Runtime.exceptionThrown Runtime.executionContextCreated Runtime.executionContextDestroyed This is how tools can run code inside the page and get structured results back.\nNetwork The Network domain exposes requests, responses, headers, timings, failures, caching, WebSocket frames, and more.\nCommon commands include:\nNetwork.enable Network.disable Network.getResponseBody Network.setExtraHTTPHeaders Network.emulateNetworkConditions Useful events include:\nNetwork.requestWillBeSent Network.responseReceived Network.loadingFinished Network.loadingFailed Network.webSocketFrameSent Network.webSocketFrameReceived If I am debugging API calls, auth headers, CORS, failed assets, or request timing, this is the domain I care about.\nDOM And CSS The DOM and CSS domains power much of what developers see in the Elements panel.\nThey can inspect nodes, attributes, boxes, stylesheets, computed styles, and rule matches.\nFor most automation, I prefer higher-level locators. But if I am building a DevTools-like tool, these domains matter.\nTarget The Target domain is how tools discover and attach to targets.\nThis matters for multi-tab flows, workers, service workers, browser contexts, iframes, and tools that need the browser-level connection.\nCommon commands include:\nTarget.getTargets Target.attachToTarget Target.detachFromTarget Target.createTarget Target.closeTarget Target.setAutoAttach If your CDP script works for one tab but fails in a multi-target browser, the Target domain is often where the missing concept lives.\nEmulation The Emulation domain lets tools override runtime conditions.\nExamples include:\nviewport size device metrics geolocation timezone media type CPU throttling touch support display features This is how browser tools simulate mobile devices, slow machines, location-specific behavior, or other environment differences.\nCDP Versions CDP has multiple protocol views.\nThe official protocol viewer exposes:\nlatest tip-of-tree: the newest protocol surface, but it can change and break v8-inspector: protocol used for debugging and profiling Node.js apps stable 1.3: an older stable subset tagged at Chrome 64 For real tooling, the most reliable source is often the browser itself:\nhttp://127.0.0.1:9222/json/protocol That returns the protocol schema supported by the running browser.\nThis matters because CDP changes with Chrome. If you build against tip-of-tree docs and run against an older browser, a command may not exist or a parameter may behave differently.\nMy rule is simple:\nuse the docs to learn concepts use the running browser\u0026rsquo;s /json/protocol for exact compatibility pin browser versions for repeatable automation Protocol Monitor: Learning CDP From DevTools Itself Chrome DevTools includes a Protocol Monitor panel.\nThe Protocol Monitor documentation says it can record CDP requests and responses made by DevTools, inspect messages, save messages, and send CDP commands.\nThis is one of the best ways to learn CDP.\nWhen you click around DevTools, Protocol Monitor shows the CDP traffic behind those actions.\nFor example, if you open the Network panel, reload a page, inspect a request, and look at the Protocol Monitor, you can see which protocol domains and events are involved.\nYou can also send commands directly. A parameter-free command can be typed as:\nPage.captureScreenshot For commands with parameters, the docs show JSON like:\n{ \u0026#34;cmd\u0026#34;: \u0026#34;Page.captureScreenshot\u0026#34;, \u0026#34;args\u0026#34;: { \u0026#34;format\u0026#34;: \u0026#34;jpeg\u0026#34; } } The CDP editor can generate a structured parameter form based on protocol definitions.\nThat makes Protocol Monitor useful for two jobs:\nlearning how DevTools itself talks to Chrome prototyping raw CDP commands before writing code CDP And Security CDP is powerful because it can control the browser.\nThat is also why it must be treated carefully.\nIf a tool can connect to a debug port, it can inspect pages, read storage, evaluate scripts, watch network requests, interact with UI, and access sensitive session state in that browser profile.\nMy practical rules:\nnever expose the debug port to an untrusted network bind to localhost for local development use a separate --user-data-dir avoid using your normal browser profile close the debug browser when finished do not browse sensitive sites in a debuggable profile use test accounts for automation This is not theoretical. CDP is designed for debugging and instrumentation. Treat it with the same care as a shell attached to your browser session.\nCDP vs WebDriver CDP and WebDriver overlap, but they are not the same.\nWebDriver is a browser automation standard designed around user-like automation across browsers.\nCDP is a Chrome/Blink debugging and instrumentation protocol.\nThe distinction matters:\nQuestion CDP WebDriver Primary focus Debugging, instrumentation, DevTools capabilities Browser automation standard Browser scope Chrome, Chromium, Blink-based browsers, Node via v8-inspector Cross-browser API shape Domains, commands, events over WebSocket WebDriver commands through driver endpoints Best for DevTools-like tools, tracing, network inspection, low-level browser control Cross-browser UI testing Abstraction level Low Higher For tests that must run across multiple browsers, WebDriver or Playwright is usually a better starting point.\nFor Chrome-specific observability, performance traces, network internals, or DevTools-style tooling, CDP is often the right layer.\nCDP vs Chrome DevTools MCP Chrome DevTools MCP is built for agents.\nCDP is built for tools.\nThat difference changes the API shape.\nWith CDP, you might send:\n{\u0026#34;id\u0026#34;:1,\u0026#34;method\u0026#34;:\u0026#34;Network.enable\u0026#34;} Then subscribe to low-level events, collect request IDs, fetch response bodies, correlate timings, and summarize results yourself.\nWith Chrome DevTools MCP, an agent can use a higher-level tool such as:\nlist_network_requests get_network_request The MCP layer returns something more usable for an AI workflow.\nThat is the value of abstraction. CDP gives raw capability. MCP packages useful browser workflows as agent tools.\nIf I am building the MCP server, I care about CDP.\nIf I am using an agent to debug an app, I usually want the MCP tools.\nWhere CDP Is Still The Right Tool I would reach for direct CDP when:\nbuilding a browser automation library building a DevTools-like interface collecting custom browser telemetry inspecting network events at a low level automating Chrome features not exposed by a higher-level library experimenting with new protocol domains debugging service workers, workers, targets, frames, or browser contexts integrating with a non-standard runtime that exposes a CDP-compatible endpoint learning how DevTools actually works I would not start with direct CDP for a normal end-to-end test.\nFor that, I want Playwright or Puppeteer. They hide a lot of hard details: waiting, selectors, frames, file outputs, browser lifecycle, contexts, retries, and cross-platform behavior.\nA Good Mental Model Here is how I think about CDP:\nChrome DevTools UI is the dashboard. CDP is the wire protocol behind the dashboard. Puppeteer and Playwright are developer-friendly automation layers. Chrome DevTools MCP is an agent-friendly tool layer. The lower you go, the more control you get.\nThe lower you go, the more plumbing you own.\nThat is the tradeoff.\nMy Take CDP is one of those technologies that most web developers use indirectly for years before noticing it.\nEvery time DevTools shows a network request, pauses on a breakpoint, prints a console error, captures a screenshot, or records a trace, there is a protocol-shaped world underneath.\nI do not think every developer needs to write raw CDP clients. But understanding CDP helps explain why browser tooling works the way it does.\nIt also makes modern agent tooling easier to understand. Tools like Chrome DevTools MCP are not magic. They are packaging browser inspection and debugging capabilities into a form that agents can use.\nFor day-to-day work, I still prefer higher-level tools.\nBut when I need to know what the browser can really expose, CDP is the layer I look at.\nResources Chrome DevTools Protocol documentation Chrome DevTools Protocol viewer: latest tip-of-tree ChromeDevTools/devtools-protocol on GitHub Protocol Monitor: View and send CDP requests Chrome DevTools documentation ChromeDevTools/chrome-devtools-mcp ","permalink":"https://learncodecamp.net/chrome-devtools-protocol-cdp-explained/","summary":"\u003cp\u003eEvery web developer uses Chrome DevTools.\u003c/p\u003e\n\u003cp\u003eWe inspect elements, read console logs, watch network requests, throttle CPU, emulate mobile screens, record performance traces, check storage, debug JavaScript, and capture screenshots.\u003c/p\u003e\n\u003cp\u003eMost of that feels like a browser UI.\u003c/p\u003e\n\u003cp\u003eUnder the hood, there is a protocol.\u003c/p\u003e\n\u003cp\u003eThat protocol is \u003ca href=\"https://chromedevtools.github.io/devtools-protocol/\"\u003eChrome DevTools Protocol\u003c/a\u003e, usually shortened to \u003cstrong\u003eCDP\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eCDP is the browser debugging API that lets tools instrument, inspect, debug, and profile Chrome, Chromium, and other Blink-based browsers. Chrome DevTools itself uses this protocol. Many automation and debugging tools also build on it directly or indirectly.\u003c/p\u003e","title":"Chrome DevTools Protocol (CDP) Explained: The Browser Debugging API Behind DevTools"},{"content":"Coding agents are useful when they can read code, edit files, run tests, and explain errors.\nBut web development has a problem that does not fit neatly inside the file system:\nthe real bug often lives in the browser.\nA React component may look fine in code but overflow on mobile. An API call may fail only after a specific login state. A button may be present in the DOM but not clickable. A performance issue may come from layout shifts, long tasks, font loading, image decoding, or network waterfalls. A console error may point to bundled JavaScript that needs source maps to be useful.\nThis is where ChromeDevTools/chrome-devtools-mcp becomes interesting.\nIt gives coding agents access to a live Chrome browser through the Model Context Protocol (MCP). In practice, that means an agent can inspect pages, take screenshots, read console logs, analyze network requests, run Lighthouse checks, record performance traces, and interact with the page using browser-level tooling instead of guessing from source code alone.\nIn this post, I explain what Chrome DevTools MCP is, how it works, where it fits, how to configure it, and the tradeoffs I would keep in mind before giving an agent access to a browser session.\nWhat Chrome DevTools MCP Is Chrome DevTools MCP is an MCP server maintained under the ChromeDevTools GitHub organization.\nThe repository describes it as \u0026ldquo;Chrome DevTools for agents.\u0026rdquo; That is a good mental model.\nAn MCP server is a bridge between an AI client and an external capability. In this case, the external capability is Chrome plus Chrome DevTools. Your coding agent does not need to implement browser automation from scratch. It asks the MCP server to perform browser operations through a structured set of tools.\nAt a high level, the stack looks like this:\nCoding agent | | MCP tool calls v chrome-devtools-mcp | | Chrome DevTools / Puppeteer / browser automation v Live Chrome browser | | DOM, accessibility tree, console, network, traces, screenshots v Real page behavior That last line is the important part.\nThe agent is no longer limited to reading static source files and imagining how the UI behaves. It can look at the actual running application.\nWhy This Matters Most frontend bugs are not purely code-reading problems.\nThey are runtime problems.\nFor example:\n\u0026ldquo;The login form works locally but not in production.\u0026rdquo; \u0026ldquo;The button is visible but clicking it does nothing.\u0026rdquo; \u0026ldquo;The mobile header overlaps the page title.\u0026rdquo; \u0026ldquo;The checkout page shows a 500 after selecting a shipping method.\u0026rdquo; \u0026ldquo;The page feels slow, but the API response looks fast.\u0026rdquo; \u0026ldquo;The component is accessible in Storybook but not in the real page.\u0026rdquo; A human developer usually switches between editor, browser, DevTools, terminal, network panel, console, and screenshots. A coding agent without browser access can only do part of that workflow.\nChrome DevTools MCP closes that gap. It gives the agent a way to collect browser evidence before editing code.\nThat changes the quality of the loop:\nWithout browser access: read code -\u0026gt; guess bug -\u0026gt; edit -\u0026gt; ask user to verify With Chrome DevTools MCP: open page -\u0026gt; inspect runtime -\u0026gt; reproduce bug -\u0026gt; edit -\u0026gt; verify in browser The second loop is much closer to how I debug web apps manually.\nWhat It Can Do The current tool reference groups Chrome DevTools MCP tools into these areas:\nCategory What the agent can do Input automation Click, drag, hover, fill fields, fill forms, type text, upload files, handle dialogs Navigation automation Open pages, list tabs, select tabs, navigate, reload, wait for text, close pages Emulation Resize the page, emulate viewport, user agent, geolocation, color scheme, CPU, and network conditions Performance Start and stop traces, save trace files, analyze performance insights Network List network requests and inspect a specific request or response Debugging Evaluate JavaScript, inspect console messages, run Lighthouse audits, take screenshots, take accessibility snapshots Memory Capture and inspect heap snapshots for memory leak analysis Extensions Install, list, reload, trigger, or uninstall Chrome extensions when enabled WebMCP List and execute WebMCP tools exposed by a page when experimental support is enabled Third-party developer tools Execute page-exposed developer tools when the experimental category is enabled That is a broad surface area, but the important thing is not the number of tools. The important thing is that these tools map to the evidence developers already use.\nIf I ask an agent to fix a broken signup page, I do not want it to immediately edit a random component. I want it to:\nOpen the page. Take an accessibility snapshot. Fill the form like a user would. Watch console output. Inspect network requests. Capture a screenshot if the UI looks wrong. Only then decide what code to change. Chrome DevTools MCP makes that workflow possible.\nAccessibility Snapshots Are More Useful Than Screenshots First One detail I like in the tool design is the take_snapshot tool.\nIt returns a text snapshot based on the accessibility tree. That gives the agent a structured view of the page: headings, buttons, links, inputs, labels, and element identifiers that can be used in later actions.\nFor an agent, this is often better than starting with a screenshot.\nA screenshot is visually rich, but it is expensive to interpret and can miss semantics. The accessibility tree is closer to what the page says it is.\nFor example, a snapshot can tell the agent:\nbutton \u0026#34;Submit order\u0026#34; [uid=12] textbox \u0026#34;Email address\u0026#34; [uid=7] link \u0026#34;Terms of service\u0026#34; [uid=19] Now the agent can click the button by UID instead of by screen coordinate.\nThat matters because coordinate-based automation is brittle. If a banner appears, a viewport changes, or a font loads differently, the coordinate can point to the wrong thing. Accessibility-based interaction is usually more stable and also encourages developers to build better semantics.\nScreenshots still matter. I want screenshots when layout, visual regression, canvas, charts, clipping, or responsive design is the problem. But for basic navigation and form flows, a semantic snapshot is a better first move.\nSetup The basic MCP configuration uses npx:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;chrome-devtools-mcp@latest\u0026#34;] } } } The repository recommends chrome-devtools-mcp@latest so the client uses the latest server version.\nFor simple browser tasks, there is also a slim mode:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;chrome-devtools-mcp@latest\u0026#34;, \u0026#34;--slim\u0026#34;, \u0026#34;--headless\u0026#34;] } } } The package also includes a CLI named chrome-devtools, which is useful even without an MCP client:\nnpm i chrome-devtools-mcp@latest -g chrome-devtools status chrome-devtools new_page \u0026#34;https://example.com\u0026#34; chrome-devtools take_screenshot --filePath screenshot.png chrome-devtools stop The CLI talks to a background daemon. That means the browser state can persist across commands until you stop it.\nBrowser Connection Modes There are several ways to connect Chrome DevTools MCP to Chrome.\nThe default mode starts Chrome using a Chrome DevTools MCP profile. That is usually the easiest way to begin because it keeps the session separate from your normal Chrome profile.\nYou can also run headless:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [ \u0026#34;chrome-devtools-mcp@latest\u0026#34;, \u0026#34;--headless=true\u0026#34; ] } } } If you want independent temporary profiles for multiple MCP sessions, use --isolated:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [ \u0026#34;chrome-devtools-mcp@latest\u0026#34;, \u0026#34;--isolated=true\u0026#34; ] } } } If Chrome is already running with a remote debugging port, connect with --browser-url:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [ \u0026#34;chrome-devtools-mcp@latest\u0026#34;, \u0026#34;--browser-url=http://127.0.0.1:9222\u0026#34; ] } } } Then start Chrome with a dedicated profile:\n/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome \\ --remote-debugging-port=9222 \\ --user-data-dir=/tmp/chrome-profile-stable Chrome requires a non-default user data directory for this remote debugging flow. That is good. A debug port can control the browser, so I do not want it attached to my everyday profile by accident.\nThere is also --autoConnect, which lets the MCP server request a remote debugging connection to a running local Chrome instance. The current README says this requires Chrome 144+ and remote debugging enabled through chrome://inspect/#remote-debugging.\nThat flow is useful because it allows a handoff between manual debugging and agent-assisted debugging. You can inspect something in DevTools, then let the coding agent continue from the current browser context.\nA Practical Debugging Workflow Here is how I would use Chrome DevTools MCP when debugging a real web app.\nFirst, start the app locally:\nhugo server -D or:\nnpm run dev Then ask the agent to open the page:\nOpen http://localhost:1313 and inspect the home page. The agent should not immediately edit code. A good browser-backed workflow looks like this:\nnew_page or navigate_page to open the target route. take_snapshot to understand the page semantically. list_console_messages to check runtime errors. list_network_requests to check failed or slow requests. take_screenshot when visual layout matters. emulate or resize_page to reproduce mobile and desktop states. evaluate_script only when the agent needs exact runtime details. Edit code. Rebuild or let the dev server reload. Re-check the page in Chrome. That is a complete loop.\nFor performance work, the workflow is different:\nNavigate to the target page. Start a performance trace with reload. Stop the trace or let auto-stop finish. Analyze the reported insights. Save the trace file if the raw data is needed. Fix the biggest bottleneck first. Repeat under the same viewport and network settings. The important part is repeatability. If the agent changes viewport, cache, throttling, or route between runs, the measurements become hard to compare.\nWhen I Would Use It I would use Chrome DevTools MCP for:\nreproducing UI bugs in a real browser checking console errors after a code change validating responsive layouts inspecting network failures debugging authenticated browser flows collecting screenshots before and after a fix running Lighthouse checks for accessibility, SEO, best practices, or agentic browsing recording performance traces investigating memory leaks with heap snapshots testing browser extension behavior when extension tools are enabled experimenting with WebMCP pages when the WebMCP category is enabled I would especially use it when a bug report contains browser-visible behavior:\nOn mobile, the filters panel opens but I cannot close it. That is not a \u0026ldquo;read the code and guess\u0026rdquo; task. It is a browser task.\nWhen I Would Not Use It Chrome DevTools MCP is not the right tool for everything.\nI would not use it as a replacement for unit tests, component tests, type checks, or static analysis. Those are faster and more deterministic.\nI would also avoid giving it my normal browser profile unless I had a clear reason. Browser sessions contain sensitive data: cookies, local storage, account pages, private tabs, admin dashboards, payment pages, and internal systems.\nFor most work, I prefer one of these:\na temporary profile with --isolated a dedicated debug profile with --user-data-dir a local test account a local app with seeded data headless mode for simple repeatable checks If the task requires an authenticated state, I still want to think about what the agent can see and modify.\nSecurity And Privacy The repository is explicit about the risk: the MCP server exposes browser content to MCP clients, allowing them to inspect, debug, and modify data in the browser or DevTools.\nThat is powerful, but it is also sensitive.\nMy practical rules would be:\nDo not browse personal email, banking, healthcare, or private account pages in a debug session. Use a separate Chrome profile for agent work. Close the session when the task is done. Avoid opening remote debugging ports on shared or untrusted machines. Do not expose 127.0.0.1:9222 outside the local machine. Use test accounts for production-like flows. Review agent actions before allowing destructive operations. There are also two telemetry-related details worth knowing.\nFirst, usage statistics are enabled by default. The README says they can be disabled with:\n\u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;chrome-devtools-mcp@latest\u0026#34;, \u0026#34;--no-usage-statistics\u0026#34;] or by setting CHROME_DEVTOOLS_MCP_NO_USAGE_STATISTICS or CI.\nSecond, performance tools may send trace URLs to the CrUX API to fetch field performance data. That can be disabled with:\n--no-performance-crux Those defaults may be fine for many developers, but I would make them explicit in team tooling.\nHow It Relates To CDP, Puppeteer, Playwright, And WebMCP It is easy to mix up the browser tooling stack, so here is how I separate the layers.\nTool What it is Best use Chrome DevTools Protocol (CDP) Low-level browser debugging protocol Building browser tools, automation libraries, custom debugging workflows Puppeteer High-level Node.js browser automation library Scripts, tests, screenshots, PDF generation, controlled browser flows Playwright Cross-browser automation and testing library End-to-end tests and reliable multi-browser automation Chrome DevTools MCP MCP server that exposes Chrome/DevTools capabilities to agents Letting coding agents inspect, debug, automate, and verify web pages WebMCP Proposed site-side capability contract for web apps Letting pages expose structured product actions to agents Chrome DevTools MCP is not the same thing as CDP. CDP is the low-level protocol. Chrome DevTools MCP is an agent-facing interface built on top of browser automation and DevTools capabilities.\nChrome DevTools MCP is also not the same thing as WebMCP. WebMCP is about the website exposing app-level tools. Chrome DevTools MCP is about the agent controlling and inspecting the browser.\nOne looks from outside the page. The other lets the page describe what can be done.\nBoth matter.\nThe Agent Should Still Think Like A Developer Browser access does not remove the need for engineering judgment.\nAn agent with Chrome DevTools MCP can click around, collect logs, and run traces. But it still needs to answer the same questions a developer would ask:\nWhat behavior is actually broken? Can I reproduce it? Is it a frontend bug, backend bug, data bug, cache bug, auth bug, or environment issue? What evidence supports that? What is the smallest code change that fixes it? How do I verify the fix in the browser? The best use of Chrome DevTools MCP is not \u0026ldquo;let the agent randomly operate Chrome.\u0026rdquo;\nThe best use is a disciplined debugging loop:\nobserve -\u0026gt; reproduce -\u0026gt; inspect -\u0026gt; change -\u0026gt; verify That is where the tool becomes genuinely useful.\nCommon Problems If the MCP server does not start, the troubleshooting guide suggests checking:\nwhether npx chrome-devtools-mcp@latest --help works in the terminal whether the MCP client uses the same Node and npm version as the terminal whether the client logs show a specific error whether the npx cache is corrupted whether Chrome can start on the machine For verbose logging:\nDEBUG=* npx chrome-devtools-mcp@latest --log-file=/tmp/chrome-devtools-mcp.log On Windows, some clients need cmd /c before npx:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;chrome-devtools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;cmd\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;/c\u0026#34;, \u0026#34;npx\u0026#34;, \u0026#34;-y\u0026#34;, \u0026#34;chrome-devtools-mcp@latest\u0026#34;] } } } If the MCP server runs inside a sandbox and cannot start Chrome, connect to a manually started browser with --browser-url.\nIf you use --autoConnect, make sure Chrome is already running, remote debugging is enabled, the permission prompt was allowed, and no other tool is fighting for the same debugging connection.\nMy Take Chrome DevTools MCP is important because it gives coding agents a real browser feedback loop.\nThat sounds simple, but it changes how useful an agent can be on frontend work. Instead of only reading files, it can inspect the page. Instead of guessing why a button fails, it can watch the console and network panel. Instead of saying \u0026ldquo;try this\u0026rdquo;, it can make a change and verify the result.\nI do not see it as a replacement for DevTools. I see it as a way to bring DevTools into the agent workflow.\nThe best version of this is not a fully autonomous browser operator. The best version is a coding partner that can gather browser evidence, make scoped fixes, and prove that the page behaves better after the change.\nFor web developers, that is the useful milestone.\nResources ChromeDevTools/chrome-devtools-mcp on GitHub Chrome DevTools MCP tool reference Chrome DevTools MCP CLI documentation Chrome DevTools MCP troubleshooting guide Let your Coding Agent debug your browser session with Chrome DevTools MCP Chrome DevTools Protocol ","permalink":"https://learncodecamp.net/chrome-devtools-mcp-for-coding-agents/","summary":"\u003cp\u003eCoding agents are useful when they can read code, edit files, run tests, and explain errors.\u003c/p\u003e\n\u003cp\u003eBut web development has a problem that does not fit neatly inside the file system:\u003c/p\u003e\n\u003cp\u003ethe real bug often lives in the browser.\u003c/p\u003e\n\u003cp\u003eA React component may look fine in code but overflow on mobile. An API call may fail only after a specific login state. A button may be present in the DOM but not clickable. A performance issue may come from layout shifts, long tasks, font loading, image decoding, or network waterfalls. A console error may point to bundled JavaScript that needs source maps to be useful.\u003c/p\u003e","title":"Chrome DevTools MCP: How Coding Agents Debug Real Browser Sessions"},{"content":"Originally published on DEV.to as a submission for the Google I/O Writing Challenge.\nAt Google I/O 2026, the loud announcements were easy to spot: Gemini 3.5, Antigravity 2.0, Android agents, AI Studio upgrades, and a lot of new ways to build software with AI.\nThe announcement I kept coming back to was much quieter:\nWebMCP.\nThe Chrome docs describe it as a proposed open web standard that can be tested locally behind a Chrome flag and explored with demo apps.\nBut the idea underneath it is important:\nWhat if websites stopped forcing agents to guess what buttons and forms mean, and started exposing structured, typed actions directly?\nThat sounds small until you compare it with the tool that exists today: Chrome DevTools MCP, Google\u0026rsquo;s official MCP server that lets coding agents control and inspect Chrome through DevTools.\nAfter looking at both, my take is simple:\nChrome DevTools MCP helps agents understand the web we already built. WebMCP asks us to build a web that agents can use without guessing.\nThat difference matters for every web developer.\nThe Current Web Is Still Built For Eyes And Fingers Most web apps assume the user is a human looking at pixels and moving through a UI one click at a time.\nThat model works for people. It is much less reliable for agents.\nAn agent can try to inspect the DOM. It can use the accessibility tree. It can take a screenshot. It can click buttons. It can fill fields. But unless the app exposes clearer intent, the agent still has to infer a lot:\nIs this button destructive or reversible? Does this date field expect MM/DD/YYYY, YYYY-MM-DD, or a custom picker flow? Is the visible price final, or does tax appear later? Does this form submit immediately, or save a draft? Is this disabled button waiting on validation, auth, inventory, or JavaScript state? Humans handle ambiguity with context. Agents handle ambiguity with retries, brittle heuristics, and occasional nonsense.\nWebMCP is interesting because it tries to reduce that ambiguity at the source.\nWhat WebMCP Adds The Chrome WebMCP documentation describes WebMCP as a way for web pages to expose structured tools for AI agents. A page can register JavaScript functions or annotate HTML forms so an agent can discover available actions, understand input schemas, and call those actions inside the current browser context.\nIn other words, the website can say:\n// Conceptual example, not exact production code registerTool(\u0026#34;searchFlights\u0026#34;, { description: \u0026#34;Search available flights\u0026#34;, input: { origin: \u0026#34;string\u0026#34;, destination: \u0026#34;string\u0026#34;, date: \u0026#34;string\u0026#34;, passengers: \u0026#34;number\u0026#34; } }); That is a different contract from \u0026ldquo;look for a textbox that probably means origin, type into it, tab somewhere, hope the custom date picker behaves, and click the blue button.\u0026rdquo;\nThe official docs call out support for discovery, JSON Schema, and page state. They also give examples like support flows, travel booking, structured forms, date pickers, and hidden diagnostic actions.\nThe important word is structured.\nThe web already has APIs. But WebMCP is not a backend API. It lives in the browser context. The tool call can update the same UI the user sees. That keeps the user in the loop and preserves the visible product experience, while giving the agent a more reliable path than raw actuation.\nWhy I Compared It With Chrome DevTools MCP The Google I/O developer keynote put WebMCP and Chrome DevTools for agents in the same broader section: \u0026ldquo;Redefining web development in the agentic era.\u0026rdquo; That pairing is useful.\nChrome DevTools for agents gives coding agents the ability to interact with Chrome, inspect pages, debug runtime behavior, emulate real-world user experiences, run audits, inspect console messages, analyze network requests, take accessibility-tree snapshots, and run performance workflows.\nThe GitHub README for chrome-devtools-mcp describes it as an MCP server that lets agents such as Antigravity, Claude, Cursor, Copilot, and Codex control and inspect a live Chrome browser. The tool reference includes navigation, input automation, emulation, network inspection, console inspection, screenshots, accessibility snapshots, Lighthouse audits, performance traces, memory tools, extension tools, and experimental WebMCP tools.\nThat is a lot of power.\nBut it is a different layer.\nChrome DevTools MCP is mostly a developer-side debugging and automation tool.\nWebMCP is a site-side capability contract.\nOne lets an agent inspect what is there. The other lets a site declare what can be done.\nMy Small Test I wanted a hands-on check instead of writing another \u0026ldquo;AI will change everything\u0026rdquo; post.\nThe WebMCP docs point to demos covering both imperative and declarative implementations:\nWebMCP zaMaker, which uses the WebMCP Imperative API. A travel demo, also using the WebMCP Imperative API. Le Petit Bistro, which uses the WebMCP Declarative API. I started with WebMCP zaMaker because the imperative version makes the core idea very visible. Instead of asking an agent to infer pizza controls from pixels, the page registers explicit tools that the inspector can discover.\nI enabled WebMCP testing in Chrome, opened the zaMaker demo, and used the WebMCP - Model Context Tool Inspector extension.\nThe extension surfaced several page-defined tools, including:\nadd_topping manage_pizza remove_topping set_pizza_size set_pizza_style That is the part that clicked for me. These are not generic browser actions like \u0026ldquo;click at coordinate X\u0026rdquo; or \u0026ldquo;type into input Y.\u0026rdquo; They are product-level capabilities exposed by the page.\nFor example, the inspector showed add_topping with a schema that included a topping enum and a size enum. It also showed set_pizza_size with a structured size input, plus a number_of_persons field that could help infer the right size.\nThen I used natural language prompts in the inspector:\nadd pizza with large toppings The inspector translated that into a tool call:\n{ \u0026#34;size\u0026#34;: \u0026#34;Large\u0026#34;, \u0026#34;topping\u0026#34;: \u0026#34;🍕\u0026#34; } Then I tried:\nmake the pizza extra large The extension called:\n{ \u0026#34;size\u0026#34;: \u0026#34;Extra Large\u0026#34; } The page responded by changing the pizza state.\nThat small demo made the difference clearer than the docs alone. A browser automation agent can click around a pizza builder. A WebMCP-aware page can instead say, \u0026ldquo;Here are the actions this product supports, here are the allowed parameters, and here is what happened when you called one.\u0026rdquo;\nFor contrast, Chrome DevTools MCP felt like a developer-side lens. It can inspect a page, read the accessibility tree, look at console output, automate interactions, and help an agent debug what is already rendered in Chrome.\nThat is powerful, but it is still looking at the page from the outside. The zaMaker demo showed the other side of the idea: the page itself can publish a small set of intentional actions for agents to use.\nSo my hands-on result was:\nChrome DevTools MCP is practical today for inspecting and testing pages. The WebMCP inspector shows what changes when the page itself exposes product-level tools.\nWebMCP vs Chrome DevTools MCP Here is the cleanest way I now think about the difference:\nQuestion WebMCP Chrome DevTools MCP Who exposes the capability? The website or web app The browser / DevTools layer Who is it mainly for? Browser-based user agents acting inside a site Coding agents, QA agents, and developer workflows What does it make explicit? App-defined tools, inputs, outputs, and page state Browser state, DOM/a11y snapshots, console, network, performance, screenshots What problem does it reduce? Agents guessing how to use a product Developers manually inspecting and debugging browser behavior Best current use Experimental agent-ready product flows Real debugging, QA, performance, accessibility checks Biggest limitation Requires browser support and app implementation Still often acts through page structure, snapshots, and inferred intent If an agent is trying to debug why a checkout page is broken, Chrome DevTools MCP is the right tool.\nIf an agent is trying to book a trip, submit a support request, configure a dashboard, or complete a multi-step workflow inside an app, WebMCP is the more interesting long-term answer.\nWhy This Is Bigger Than \u0026ldquo;AI Can Click Buttons\u0026rdquo; Before WebMCP, the default browser-agent path looked like this:\nSee the page. Guess the user\u0026rsquo;s next action. Click or type. Observe the result. Retry if wrong. That can work, but it is fragile. It is also slow and expensive because every step adds model reasoning, visual parsing, DOM interpretation, or both.\nWebMCP suggests a different path:\nDiscover the site\u0026rsquo;s available tools. Pick the tool that matches the user\u0026rsquo;s goal. Send typed parameters. Let the site execute the action in the visible browser context. Return structured output or a clear error. That is closer to an API, but with the user still looking at the product.\nThis is why I think WebMCP matters. It is not only about making agents more powerful. It is about moving responsibility back to application developers. If we want agents to act safely and reliably, we cannot make them reverse-engineer every workflow from pixels.\nWe need to expose intent.\nWhat Developers Can Do Before WebMCP Is Everywhere Most of us cannot ship production WebMCP flows tomorrow. Browser support is early, and the proposal is still changing.\nBut we can start building sites that are easier for both humans and agents to understand.\nThe practical checklist I took from this:\nUse semantic HTML before custom widgets. Make important buttons and forms clear in the accessibility tree. Give inputs stable names and labels. Avoid hiding critical state only in visual styling. Keep destructive actions behind explicit confirmation. Separate \u0026ldquo;preview\u0026rdquo;, \u0026ldquo;save draft\u0026rdquo;, \u0026ldquo;submit\u0026rdquo;, and \u0026ldquo;purchase\u0026rdquo; flows clearly. Make validation errors machine-readable and human-readable. Test important flows with browser automation, accessibility snapshots, and Lighthouse. Think about which app actions would deserve structured tools later. If I were preparing a product for WebMCP, I would not start by exposing every button as a tool. I would start with the few workflows where ambiguity hurts most:\nsearch checkout booking support ticket creation return/refund initiation dashboard filtering diagnostics account settings changes Those are the places where agents guessing through the UI can create real user pain.\nThe Security Question There is an obvious risk here: if websites expose actions to agents, bad tool design can make bad actions easier.\nThat is why I like that the WebMCP model keeps actions in the browser context instead of turning every site into a blind backend API. Sensitive actions can still require visible UI, user confirmation, and page-level state.\nBut developers will need discipline.\nA good WebMCP tool should have:\na narrow purpose a clear name a strict schema useful error messages visible execution confirmation for irreversible actions no surprise side effects The goal should not be \u0026ldquo;let agents do anything.\u0026rdquo;\nThe goal should be \u0026ldquo;let agents do the right thing with less guessing.\u0026rdquo;\nMy Take Chrome DevTools MCP feels like the tool web developers can use now.\nWebMCP feels like the contract web developers may need to design for next.\nThat is why I think it was one of the more important web announcements at Google I/O 2026. It points to a shift from:\nagents as better screen scrapers\nto:\nagents as first-class users of structured web capabilities\nThat shift will not happen overnight. It needs browser support, standards work, developer tooling, security patterns, and a lot of real-world testing.\nBut the direction is clear. If agents are going to use the web on our behalf, web apps need to become more than visually usable.\nThey need to become understandable.\nThey need to become inspectable.\nAnd eventually, they need to become agent-ready.\nResources WebMCP documentation WebMCP Imperative API Chrome DevTools for agents ChromeDevTools/chrome-devtools-mcp on GitHub Google I/O 2026 Developer Keynote recap GoogleChromeLabs WebMCP tools and demos ","permalink":"https://learncodecamp.net/webmcp-is-the-quiet-google-io-announcement-that-could-make-web-apps-agent-ready/","summary":"\u003cp\u003e\u003cem\u003eOriginally published on \u003ca href=\"https://dev.to/nkalra0123/webmcp-is-the-quiet-google-io-announcement-that-could-make-web-apps-agent-ready-1p9l\"\u003eDEV.to\u003c/a\u003e as a submission for the \u003ca href=\"https://dev.to/challenges/google-io-writing-2026-05-19\"\u003eGoogle I/O Writing Challenge\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cimg alt=\"WebMCP cover image showing agent-ready web apps and structured tool panels\" loading=\"lazy\" src=\"https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8p6zcj79nu3bkengcgsy.png\"\u003e\u003c/p\u003e\n\u003cp\u003eAt Google I/O 2026, the loud announcements were easy to spot: Gemini 3.5, Antigravity 2.0, Android agents, AI Studio upgrades, and a lot of new ways to build software with AI.\u003c/p\u003e\n\u003cp\u003eThe announcement I kept coming back to was much quieter:\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://developer.chrome.com/docs/ai/webmcp\"\u003eWebMCP\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThe Chrome docs describe it as a proposed open web standard that can be tested locally behind a Chrome flag and explored with demo apps.\u003c/p\u003e","title":"WebMCP Is the Quiet Google I/O Announcement That Could Make Web Apps Agent-Ready"},{"content":"Firestore has one of the most convenient real-time APIs in web and mobile development.\nWe write a listener:\nimport { collection, onSnapshot, query, where } from \u0026#34;firebase/firestore\u0026#34;; const q = query( collection(db, \u0026#34;messages\u0026#34;), where(\u0026#34;roomId\u0026#34;, \u0026#34;==\u0026#34;, \u0026#34;general\u0026#34;) ); const unsubscribe = onSnapshot(q, (snapshot) =\u0026gt; { snapshot.docChanges().forEach((change) =\u0026gt; { console.log(change.type, change.doc.id, change.doc.data()); }); }); After that, the UI updates whenever matching documents are added, modified, or removed.\nIt feels similar to a WebSocket because the browser receives real-time updates without manually polling. But Firestore listeners and WebSockets are not the same abstraction.\nA WebSocket gives us a low-level bidirectional pipe.\nA Firestore listener gives us a database-backed synchronization API with query semantics, local cache behavior, metadata events, retries, security rules, and billing rules.\nIn this post, we will look at how Firestore listeners are implemented conceptually, how they compare to WebSockets, and what we should check before choosing one approach over the other.\nThe Short Version Firestore listeners are not \u0026ldquo;just WebSockets exposed as a Firebase API.\u0026rdquo;\nThe useful mental model is:\nApplication code | | onSnapshot(query) v Firestore SDK | | local cache + pending writes + watch stream v Firestore backend | | document/query changes v Application callback The SDK hides most of the hard parts:\nopening and maintaining the network stream sending listen targets for documents or queries applying document changes to a local view firing an initial snapshot marking local writes as pending retrying transient failures resuming streams when possible surfacing metadata such as cache state and pending writes With WebSockets, we build most of that application protocol ourselves.\nWhat Firestore onSnapshot Does The public API is simple.\nFor one document:\nimport { doc, onSnapshot } from \u0026#34;firebase/firestore\u0026#34;; const unsubscribe = onSnapshot(doc(db, \u0026#34;users\u0026#34;, userId), (snapshot) =\u0026gt; { console.log(snapshot.data()); }); For a query:\nimport { collection, onSnapshot, orderBy, query, where } from \u0026#34;firebase/firestore\u0026#34;; const q = query( collection(db, \u0026#34;messages\u0026#34;), where(\u0026#34;roomId\u0026#34;, \u0026#34;==\u0026#34;, \u0026#34;general\u0026#34;), orderBy(\u0026#34;createdAt\u0026#34;, \u0026#34;desc\u0026#34;) ); const unsubscribe = onSnapshot(q, (snapshot) =\u0026gt; { const messages = snapshot.docs.map((doc) =\u0026gt; ({ id: doc.id, ...doc.data() })); render(messages); }); The first callback gives an initial snapshot for the current document or query result. After that, the listener receives updates when the matching data changes.\nFor query listeners, docChanges() gives the difference between snapshots:\nsnapshot.docChanges().forEach((change) =\u0026gt; { if (change.type === \u0026#34;added\u0026#34;) { // A document entered the result set. } if (change.type === \u0026#34;modified\u0026#34;) { // A matching document changed. } if (change.type === \u0026#34;removed\u0026#34;) { // A document was deleted or no longer matches the query. } }); That diffing behavior is already higher-level than a plain WebSocket message. Firestore knows the query, tracks the result set, and tells the client how the result changed.\nThe Backend Protocol: Listen Targets and Watch Responses Under the public SDK API, Firestore has a watch/listen protocol.\nThe official Firestore RPC reference describes a Listen method that accepts ListenRequest messages and returns ListenResponse messages.\nA listen request can add or remove a target:\nListenRequest database add_target | remove_target A target can represent:\na document listen a query listen A listen response can contain:\nTargetChange DocumentChange DocumentDelete DocumentRemove ExistenceFilter Conceptually, the flow looks like this:\nClient opens listen stream Client sends AddTarget(query: messages where roomId == \u0026#34;general\u0026#34;) Server sends TargetChange(ADD) Server sends DocumentChange(message-1) Server sends DocumentChange(message-2) Server sends TargetChange(CURRENT) Server later sends DocumentChange(message-3) Client sends RemoveTarget(targetId) when unsubscribing The important detail is that the client is not subscribing to arbitrary event names like message.created.\nIt is subscribing to a Firestore target: a document or a query.\nThat is why Firestore listeners are useful for UI state that maps directly to database state.\nHow the Web SDK Transports the Stream In browser applications, Firestore uses the Firebase JavaScript SDK\u0026rsquo;s network transport. The current public API reference describes the underlying browser transport as WebChannel, with options for long polling in environments where streaming responses are blocked or buffered by proxies, antivirus software, or similar network layers.\nThat means we should not assume a browser Firestore listener is literally using the WebSocket API.\nThe SDK owns the transport choice. Our application owns the data model and listener lifecycle.\nFor example, Firestore has settings such as:\nexperimentalAutoDetectLongPolling experimentalForceLongPolling experimentalLongPollingOptions Those settings exist because browser streaming behavior can be affected by real network infrastructure. The SDK can use WebChannel behavior and long-polling fallbacks where needed.\nOn server-side SDKs, snapshot listeners may rely on gRPC streaming. The Firebase Admin Node.js settings mention that onSnapshot() is the operation that requires gRPC when preferRest is otherwise used.\nThe transport differs by SDK and runtime, but the high-level contract remains the same: listen to documents and queries, receive snapshots and changes.\nLocal Cache and Latency Compensation One of the biggest differences between Firestore listeners and raw WebSockets is local state.\nFirestore listeners interact with the SDK\u0026rsquo;s local cache.\nWhen the app writes a document, snapshot listeners can fire immediately before the write reaches the backend. Firebase calls this latency compensation.\nExample:\nconst unsubscribe = onSnapshot(doc(db, \u0026#34;tasks\u0026#34;, taskId), (snapshot) =\u0026gt; { const source = snapshot.metadata.hasPendingWrites ? \u0026#34;local\u0026#34; : \u0026#34;server\u0026#34;; console.log(source, snapshot.data()); }); If the user updates a task title, the UI can show the new title immediately. The document metadata tells us whether the snapshot includes local changes that have not been acknowledged by the server yet.\nThat behavior is not automatic with a WebSocket.\nWith a custom WebSocket system, we need to design:\noptimistic UI updates pending write state acknowledgement messages rollback behavior retry behavior conflict handling Firestore gives us a standard version of that behavior for document data.\nOffline Behavior Firestore also supports offline data access for Android, Apple, and web apps.\nWhen offline persistence is enabled, the SDK caches actively used data. Listeners can receive events from cached data while the device is offline. When the device comes back online, Firestore synchronizes local changes with the backend.\nSnapshot metadata helps distinguish cache state:\nconst unsubscribe = onSnapshot( q, { includeMetadataChanges: true }, (snapshot) =\u0026gt; { const source = snapshot.metadata.fromCache ? \u0026#34;cache\u0026#34; : \u0026#34;server\u0026#34;; console.log(`Snapshot source: ${source}`); } ); Two metadata fields matter:\nMetadata Meaning hasPendingWrites The snapshot includes local writes not yet committed to the backend fromCache The snapshot was served from local cache rather than confirmed current from the server With a WebSocket, offline behavior is application-owned. We need to decide how much data to cache, how writes are queued, how reconnects work, and how conflicts are resolved.\nFirestore makes these decisions easier when the application\u0026rsquo;s real-time state is stored in Firestore documents.\nWhat Firestore Handles For Us Firestore listeners provide a lot more than message delivery.\nThey handle:\nquery matching document diffing initial snapshots local cache reads optimistic local writes pending write metadata cache-vs-server metadata stream retries resume tokens security rules evaluation listener unsubscribe behavior That is the main reason teams choose Firestore listeners. The SDK turns real-time database synchronization into a small amount of application code.\nWith WebSockets, the platform gives us the connection. The application still needs a protocol.\nWhat WebSockets Give Us Instead WebSockets are lower-level and more flexible.\nA WebSocket connection can carry any application message:\n{ \u0026#34;type\u0026#34;: \u0026#34;cursor.moved\u0026#34;, \u0026#34;documentId\u0026#34;: \u0026#34;doc_123\u0026#34;, \u0026#34;userId\u0026#34;: \u0026#34;user_456\u0026#34;, \u0026#34;x\u0026#34;: 210, \u0026#34;y\u0026#34;: 480 } That message does not need to map to a database document. It can represent temporary state, control messages, progress events, multiplayer game updates, terminal output, audio chunks, or anything else the application defines.\nThis flexibility matters when the real-time data is not naturally a Firestore query result.\nWebSockets are often a better fit for:\nhigh-frequency cursor movement multiplayer game loops browser terminals custom presence systems collaborative editing protocols server job progress streams real-time data from multiple backend systems cases where Firestore billing or query shape does not fit the workload Firestore listeners are excellent when the live state is in Firestore.\nWebSockets are better when the live state is an application protocol.\nFirestore Listeners vs WebSockets Here is the practical comparison:\nArea Firestore Listeners WebSockets Abstraction Database synchronization Bidirectional message transport Subscription unit Document or query Application-defined channel/event Initial state Built in Must be designed Diffs Built in for query snapshots Must be designed Local cache Built into SDK Must be designed Offline writes SDK-managed for Firestore writes Must be designed Security Firestore Security Rules Custom auth and authorization Scaling Managed by Firestore Application/infrastructure-owned Message shape Firestore documents and metadata Any format Best fit Live database-backed UI Custom real-time protocol The decision is not \u0026ldquo;which one is more real-time?\u0026rdquo;\nBoth can support real-time user experiences.\nThe better question is:\nIs the real-time state primarily Firestore data, or is it an application-specific stream of events?\nCost and Billing Differences Firestore listeners are convenient, but they are not free.\nFirestore billing is tied to document reads, writes, deletes, storage, and related operations. A listener can generate reads when documents are added, updated, removed, or initially loaded into the listener.\nThat means we should check:\nhow many users will keep listeners open how many documents match each query how often matching documents change whether listeners are attached too broadly whether the UI repeatedly creates and destroys listeners whether the same data is listened to in many components A chat room with 20 recent messages may be a good listener.\nA dashboard that listens to thousands of fast-changing documents may become expensive or noisy.\nWith WebSockets, costs move somewhere else:\nserver instances load balancers pub/sub infrastructure observability engineering maintenance connection scaling Firestore reduces operational work, but we still need to model read volume. WebSockets reduce database listener costs for some workloads, but they add infrastructure and protocol ownership.\nSecurity Model Firestore listeners use Firestore Security Rules.\nThat is useful because the same authorization model can protect:\ndirect reads direct writes real-time listeners query results If a user is not allowed to read a document, the listener should not receive it.\nBut rules still need careful design. Query listeners must be compatible with the rules. The client cannot subscribe to a broad query and rely on rules to filter individual results in an arbitrary way. The query itself has to be allowed.\nWith WebSockets, security is custom.\nWe need to design:\nconnection authentication channel authorization per-message authorization token refresh behavior origin checks for browser clients rate limits abuse handling Firestore gives us a managed security layer for Firestore data. WebSockets give us full control, which also means full responsibility.\nOrdering and Consistency Firestore listeners provide snapshots of document/query state. The application receives document changes as the backend and local cache converge on a consistent view.\nThat is different from a raw event stream.\nFor example, a Firestore query listener is good for:\nShow the latest 50 messages in this room ordered by createdAt. It is less ideal for:\nDeliver every transient typing event in order. Typing indicators do not need durable database writes for every keystroke. A WebSocket or another ephemeral real-time channel is often cleaner.\nFirestore stores state. WebSockets move messages.\nThat distinction matters when designing the system.\nReconnects and Resume Behavior Both approaches need to survive network failure.\nFirestore listeners handle much of this through the SDK. The watch protocol supports resume tokens, and the SDK can reconnect and resume listening where possible. The application mostly sees snapshots and metadata changes.\nWith WebSockets, reconnect behavior is application code:\ndetect disconnect reconnect with backoff authenticate again resubscribe to channels fetch missed state deduplicate messages repair local UI state That is not a reason to avoid WebSockets. It is a reason to budget for the protocol work.\nIf the product needs custom real-time behavior, that work may be worth it. If the product only needs \u0026ldquo;keep this Firestore query live,\u0026rdquo; the Firestore listener already solves most of it.\nPerformance Considerations Before choosing Firestore listeners, we should check the query shape.\nGood listener patterns:\nlisten to one document listen to a small bounded query listen to recent items with limit unsubscribe when the UI no longer needs data keep listener ownership centralized use docChanges() to update local UI incrementally Risky listener patterns:\nlistening to a whole large collection creating listeners inside repeated child components attaching listeners without cleanup listening to high-churn documents that change many times per second using listeners for ephemeral events that do not need persistence Before choosing WebSockets, we should check the operational side:\nconnection count fan-out requirements message rate backpressure behavior auth refresh replay or missed-message recovery server deploy and drain behavior observability for connection health Neither option removes system design. They move the complexity to different places.\nWhen Firestore Listeners Are the Better Fit Firestore listeners are usually the better fit when:\nthe source of truth is already Firestore the UI needs live document or query state offline behavior is useful optimistic local writes improve UX Security Rules are the right authorization layer the listener result set is bounded and predictable the team wants less real-time infrastructure to operate Examples:\nuser profile updates task lists chat room messages with reasonable limits collaborative app metadata notification lists order status pages admin screens over Firestore collections The key is that the UI wants current database state, not an arbitrary stream of events.\nWhen WebSockets Are the Better Fit WebSockets are usually the better fit when:\nmessages are not naturally Firestore documents events are high-frequency or ephemeral the server must stream output from non-Firestore systems the protocol needs custom acknowledgement or ordering multiple data sources feed the same real-time channel Firestore read costs would be too high for the update rate the app needs custom presence, cursor, game, or terminal behavior Examples:\nlive cursor positions typing indicators multiplayer game state terminal sessions collaborative text operations live audio or binary streams infrastructure logs long-running job progress from worker systems The key is that the product needs an application protocol, not only a database listener.\nA Hybrid Architecture Is Common Many real-time applications use both.\nFor example, a chat product might use:\nFirestore for durable room messages Firestore listeners for recent message history WebSockets for typing indicators WebSockets or another presence system for online status Cloud Functions or backend workers for moderation and notifications That split works because durable state and ephemeral state have different requirements.\nDurable messages should be stored, queryable, secured, and recoverable.\nTyping indicators should be fast and temporary. They do not need to become permanent database writes.\nThe mistake is forcing every real-time signal into one technology.\nWhat We Should Check Before Choosing Before choosing Firestore listeners, check:\nDoes the UI need live Firestore document/query state? Can the query be bounded with filters, ordering, and limits? How many documents are loaded initially? How often do matching documents change? Will Security Rules allow the query cleanly? Is offline cache behavior useful or risky for this data? Can the application handle fromCache and hasPendingWrites correctly? What is the expected read cost at real usage? Before choosing WebSockets, check:\nWhat exact message protocol is needed? How will clients authenticate and refresh auth? How will channels or rooms be authorized? What happens after reconnect? How will missed messages be recovered? How will slow clients and backpressure be handled? How will the system scale across server instances? What metrics and logs are needed to operate it? If the answers mostly describe documents, queries, cache, and rules, Firestore listeners are probably the better starting point.\nIf the answers mostly describe events, channels, ordering, backpressure, and custom protocol rules, WebSockets are probably the better fit.\nKey Takeaways Firestore listeners and WebSockets both support real-time experiences, but they solve different problems.\nFirestore listeners are a managed synchronization layer for Firestore documents and queries. They provide initial snapshots, document diffs, local cache behavior, latency compensation, metadata, retries, and Security Rules integration.\nWebSockets are a lower-level bidirectional transport. They are more flexible, but they require us to design the application protocol, reconnect behavior, authorization, scaling, backpressure, and observability.\nThe practical rule is:\nuse Firestore listeners when the UI needs live Firestore state use WebSockets when the product needs a custom real-time event protocol use both when durable state and ephemeral events have different requirements That distinction keeps the architecture simpler and avoids using Firestore as a message bus or WebSockets as a database synchronization layer.\nSources and Further Reading Get realtime updates with Cloud Firestore Access data offline with Firestore Firestore RPC reference: ListenRequest and ListenResponse Firestore JavaScript settings: WebChannel and long polling Firebase Admin Node.js Firestore settings: preferRest and onSnapshot() ","permalink":"https://learncodecamp.net/firestore-listeners-vs-websockets/","summary":"\u003cp\u003eFirestore has one of the most convenient real-time APIs in web and mobile development.\u003c/p\u003e\n\u003cp\u003eWe write a listener:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-js\" data-lang=\"js\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eimport\u003c/span\u003e { \u003cspan style=\"color:#a6e22e\"\u003ecollection\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003eonSnapshot\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003equery\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003ewhere\u003c/span\u003e } \u003cspan style=\"color:#a6e22e\"\u003efrom\u003c/span\u003e \u003cspan style=\"color:#e6db74\"\u003e\u0026#34;firebase/firestore\u0026#34;\u003c/span\u003e;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003econst\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eq\u003c/span\u003e \u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003equery\u003c/span\u003e(\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  \u003cspan style=\"color:#a6e22e\"\u003ecollection\u003c/span\u003e(\u003cspan style=\"color:#a6e22e\"\u003edb\u003c/span\u003e, \u003cspan style=\"color:#e6db74\"\u003e\u0026#34;messages\u0026#34;\u003c/span\u003e),\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  \u003cspan style=\"color:#a6e22e\"\u003ewhere\u003c/span\u003e(\u003cspan style=\"color:#e6db74\"\u003e\u0026#34;roomId\u0026#34;\u003c/span\u003e, \u003cspan style=\"color:#e6db74\"\u003e\u0026#34;==\u0026#34;\u003c/span\u003e, \u003cspan style=\"color:#e6db74\"\u003e\u0026#34;general\u0026#34;\u003c/span\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e);\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003econst\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eunsubscribe\u003c/span\u003e \u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eonSnapshot\u003c/span\u003e(\u003cspan style=\"color:#a6e22e\"\u003eq\u003c/span\u003e, (\u003cspan style=\"color:#a6e22e\"\u003esnapshot\u003c/span\u003e) =\u0026gt; {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  \u003cspan style=\"color:#a6e22e\"\u003esnapshot\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003edocChanges\u003c/span\u003e().\u003cspan style=\"color:#a6e22e\"\u003eforEach\u003c/span\u003e((\u003cspan style=\"color:#a6e22e\"\u003echange\u003c/span\u003e) =\u0026gt; {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003econsole\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003elog\u003c/span\u003e(\u003cspan style=\"color:#a6e22e\"\u003echange\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003etype\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003echange\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003edoc\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003eid\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003echange\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003edoc\u003c/span\u003e.\u003cspan style=\"color:#a6e22e\"\u003edata\u003c/span\u003e());\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  });\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e});\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eAfter that, the UI updates whenever matching documents are added, modified, or removed.\u003c/p\u003e\n\u003cp\u003eIt feels similar to a WebSocket because the browser receives real-time updates without manually polling. But Firestore listeners and WebSockets are not the same abstraction.\u003c/p\u003e","title":"Firestore Listeners vs WebSockets: How Real-Time Updates Actually Work"},{"content":"Most web applications start with a simple request-response model.\nThe browser asks for something, the server responds, and the connection is done.\nThat model works well for pages, APIs, forms, dashboards, and most CRUD applications. But some features need something different:\nchat messages that appear instantly live sports scores multiplayer game state collaborative editing cursors trading prices delivery tracking real-time notifications terminal sessions in the browser For these features, repeatedly asking the server \u0026ldquo;anything new?\u0026rdquo; becomes wasteful and slow.\nThis is where WebSockets are useful.\nIn this post, we will look at how WebSockets work, what actually happens during the connection upgrade, how data flows after the connection is open, and what we should check before choosing WebSockets for a system.\nThe Problem WebSockets Solve HTTP is naturally request-response.\nA client sends a request:\nGET /notifications The server sends a response:\n[ { \u0026#34;id\u0026#34;: 1, \u0026#34;message\u0026#34;: \u0026#34;Your build finished\u0026#34; } ] Then the request is complete.\nIf the client wants fresh data later, it needs another request.\nFor real-time features, the simplest approach is polling:\nEvery 5 seconds: browser -\u0026gt; GET /notifications server -\u0026gt; latest notifications Polling is easy to implement, but it has tradeoffs:\nit sends many requests even when nothing changed updates are delayed until the next poll interval short polling intervals increase server load long polling intervals make the product feel less real-time There is also long polling, where the server keeps the request open until data is available or a timeout happens. Long polling can work, but it still creates a repeated request cycle.\nWebSockets use a different model.\nThe client and server establish one long-lived connection. After that, either side can send messages whenever it has something to say.\nWhat Is a WebSocket? A WebSocket is a persistent, full-duplex connection between a client and a server.\nThere are two important parts in that sentence:\npersistent: the connection stays open instead of closing after one response full-duplex: both client and server can send messages independently With normal HTTP APIs, the server usually speaks only after the client asks.\nWith WebSockets, the server can push data to the client at any time:\nClient Server | | | ---- open connection --\u0026gt; | | | | \u0026lt;-- new chat message --- | | | | ---- typing event -----\u0026gt; | | | | \u0026lt;-- user joined room --- | | | This makes WebSockets a good fit for interactive systems where latency matters and data flows in both directions.\nThe WebSocket URL WebSockets use their own URL schemes:\nws://example.com/socket wss://example.com/socket The difference is similar to HTTP and HTTPS:\nScheme Meaning ws:// WebSocket over an unencrypted connection wss:// WebSocket over TLS In production, we should almost always use wss://.\nIf the page is loaded over HTTPS, browsers also expect secure WebSocket connections. Trying to connect from an HTTPS page to ws:// is usually blocked as mixed content.\nHow the WebSocket Handshake Works A WebSocket connection starts as an HTTP request.\nThat is a useful detail because it means WebSockets can use the same ports as normal web traffic:\nport 80 for ws:// port 443 for wss:// The browser sends an HTTP request that asks the server to upgrade the connection:\nGET /socket HTTP/1.1 Host: example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 The key headers are:\nHeader Purpose Upgrade: websocket Tells the server the client wants to switch protocols Connection: Upgrade Marks this request as a protocol upgrade Sec-WebSocket-Key A browser-generated value used by the server to prove it understands WebSockets Sec-WebSocket-Version The WebSocket protocol version If the server accepts, it replies with:\nHTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: ... The 101 Switching Protocols status is the important signal.\nAfter that response, the connection is no longer used like a normal HTTP request-response exchange. It becomes a WebSocket connection.\nWhat Happens After the Upgrade? Once the WebSocket is open, data moves as WebSocket frames.\nAt the application level, we usually think in messages:\n{ \u0026#34;type\u0026#34;: \u0026#34;chat.message\u0026#34;, \u0026#34;roomId\u0026#34;: \u0026#34;general\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;Hello everyone\u0026#34; } Under the hood, the protocol frames those messages so the receiver can identify message boundaries.\nThat is different from raw TCP. TCP gives us a byte stream, not application-level messages. WebSocket adds message framing on top of TCP, which is one reason it is convenient in browser applications.\nWebSockets can carry:\ntext messages binary messages ping and pong frames close frames Most application code works with text messages, often JSON. Binary messages are useful when sending compact data, files, audio chunks, video chunks, or game state.\nA Minimal Browser Example The browser API is small.\nconst socket = new WebSocket(\u0026#34;wss://example.com/socket\u0026#34;); socket.addEventListener(\u0026#34;open\u0026#34;, () =\u0026gt; { socket.send(JSON.stringify({ type: \u0026#34;chat.join\u0026#34;, roomId: \u0026#34;general\u0026#34; })); }); socket.addEventListener(\u0026#34;message\u0026#34;, (event) =\u0026gt; { const message = JSON.parse(event.data); console.log(\u0026#34;Received:\u0026#34;, message); }); socket.addEventListener(\u0026#34;close\u0026#34;, () =\u0026gt; { console.log(\u0026#34;Socket closed\u0026#34;); }); socket.addEventListener(\u0026#34;error\u0026#34;, (error) =\u0026gt; { console.error(\u0026#34;Socket error:\u0026#34;, error); }); The main operations are:\ncreate a WebSocket wait for open send messages with socket.send(...) receive messages through the message event handle close and error The API looks simple, but production behavior needs more structure.\nA Minimal Node.js Server Example In Node.js, the popular ws package gives us a direct WebSocket server.\nimport { WebSocketServer } from \u0026#34;ws\u0026#34;; const server = new WebSocketServer({ port: 8080 }); server.on(\u0026#34;connection\u0026#34;, (socket) =\u0026gt; { socket.send(JSON.stringify({ type: \u0026#34;system.connected\u0026#34; })); socket.on(\u0026#34;message\u0026#34;, (data) =\u0026gt; { const message = JSON.parse(data.toString()); if (message.type === \u0026#34;chat.message\u0026#34;) { server.clients.forEach((client) =\u0026gt; { if (client.readyState === client.OPEN) { client.send(JSON.stringify(message)); } }); } }); }); This example broadcasts every chat message to every connected client.\nIt is intentionally minimal. In a real system, we would add authentication, room membership, schema validation, rate limiting, error handling, and a way to scale beyond one process.\nWebSockets vs HTTP Polling vs Server-Sent Events WebSockets are not the only way to build live updates.\nIt helps to compare the options:\nApproach Direction Best For Polling Client repeatedly asks server Simple updates where delay is acceptable Long polling Client waits on a held request Real-time-ish updates without WebSocket infrastructure Server-Sent Events Server pushes to client One-way live streams such as notifications or logs WebSockets Client and server both push Interactive two-way systems We should think about the choice this way:\nIf updates are rare and a delay is fine, use polling. If the server only needs to stream updates to the browser, consider Server-Sent Events. If both sides need low-latency communication, use WebSockets. WebSockets are powerful, but they are not automatically the simplest option.\nCommon WebSocket Use Cases Chat Chat is the classic WebSocket example.\nUsers need to send messages, receive messages, see typing indicators, and update presence. A persistent connection maps naturally to this interaction.\nCollaborative Editing In collaborative editors, users need to see changes from other users quickly:\ndocument edits cursor movement selections comments user presence The challenge here is not just transport. The harder part is conflict resolution, ordering, and consistency. WebSockets move messages, but the application still needs a correct collaboration model.\nLive Dashboards Operational dashboards often show changing metrics, job states, logs, queue depth, or alerts.\nWebSockets can push changes immediately instead of making the browser poll every few seconds.\nMultiplayer Games Games often require frequent bidirectional messages.\nFor browser games, WebSockets are commonly used because they work everywhere the browser works. For very latency-sensitive games, WebRTC data channels or custom UDP-based protocols may be considered, but WebSockets are still a practical starting point for many real-time browser games.\nBrowser Terminals Web terminals are a strong WebSocket use case.\nThe browser sends keystrokes to the server, and the server streams terminal output back to the browser. Both directions matter, and the interaction needs low latency.\nAuthentication and Authorization Authentication with WebSockets deserves careful design.\nThe opening handshake is HTTP, so we can use familiar mechanisms:\ncookies session IDs bearer tokens signed short-lived tokens For browser clients, cookies are often convenient when the WebSocket endpoint lives on the same site as the web application. Tokens are common for API-style clients.\nOne mistake we should avoid is treating a successful socket connection as permanent authorization.\nAuthorization should still be checked at the application message level.\nFor example:\nCan this user join this room? Can this user publish to this topic? Can this user subscribe to this account\u0026rsquo;s updates? Is the token still valid? Was the user removed from the workspace after connecting? A WebSocket connection can live for minutes or hours. Permissions can change during that time.\nMessage Design A WebSocket gives us transport. It does not design the application protocol for us.\nWe should use explicit message types:\n{ \u0026#34;type\u0026#34;: \u0026#34;chat.message.created\u0026#34;, \u0026#34;requestId\u0026#34;: \u0026#34;req_123\u0026#34;, \u0026#34;roomId\u0026#34;: \u0026#34;general\u0026#34;, \u0026#34;payload\u0026#34;: { \u0026#34;text\u0026#34;: \u0026#34;WebSockets make sense here\u0026#34; } } A few practical rules help:\ninclude a type field validate every incoming message keep payloads small include IDs for correlation and deduplication version the protocol if clients may lag behind servers define error messages clearly Without structure, WebSocket code can turn into a pile of string comparisons and implicit assumptions.\nHeartbeats and Dead Connections One practical issue with WebSockets is detecting dead connections.\nA connection can disappear without a clean close event:\nthe user closes a laptop mobile network changes a proxy drops an idle connection Wi-Fi disconnects a server process crashes The application should not assume that an open socket is always healthy.\nA common solution is heartbeat logic:\nServer sends ping Client replies with pong If no pong arrives in time, close the connection The WebSocket protocol has ping and pong frames. Some libraries expose them directly. In browser JavaScript, ping and pong frames are handled by the browser, so applications often implement their own heartbeat message if needed:\n{ \u0026#34;type\u0026#34;: \u0026#34;ping\u0026#34; } and:\n{ \u0026#34;type\u0026#34;: \u0026#34;pong\u0026#34; } The exact approach depends on the client and server libraries, but the goal is the same: do not keep dead connections forever.\nReconnection Strategy Clients should expect disconnections.\nReal networks are messy, especially on mobile devices.\nA good WebSocket client usually needs:\nautomatic reconnect exponential backoff jitter to avoid reconnect storms a maximum retry delay a way to resubscribe after reconnecting idempotent messages where possible For example, after reconnecting, a chat client may need to:\nauthenticate again rejoin rooms fetch missed messages from an HTTP API resume live updates The WebSocket connection should not be the only source of truth. If a user disconnects for 30 seconds, the application needs a way to recover missed state.\nScaling WebSockets Scaling WebSockets is different from scaling stateless HTTP APIs.\nWith normal HTTP, a load balancer can send each request to any healthy server because each request is independent.\nWith WebSockets, a client holds a long-lived connection to one server process.\nThat creates a few design questions:\nHow many concurrent connections can one server handle? How do we broadcast a message to users connected to different servers? Do we need sticky sessions? What happens when a server deploy restarts? How do we drain connections gracefully? For a small app, one WebSocket server may be enough.\nFor a larger app, we usually need a shared messaging layer:\nClient A -\u0026gt; WebSocket Server 1 Client B -\u0026gt; WebSocket Server 2 Server 1 \u0026lt;-\u0026gt; Redis / NATS / Kafka / Pub/Sub \u0026lt;-\u0026gt; Server 2 If Client A sends a room message to Server 1, but Client B is connected to Server 2, the servers need a shared way to distribute the message.\nRedis Pub/Sub, NATS, Kafka, RabbitMQ, cloud pub/sub systems, or a dedicated real-time platform can all fit depending on throughput, ordering, durability, and operational requirements.\nBackpressure Backpressure means the sender is producing data faster than the receiver can process it.\nThis matters with WebSockets because a slow client can cause memory to grow if the server keeps buffering outgoing messages.\nExamples:\na browser tab is throttled in the background a mobile client has a weak connection the server broadcasts too many messages the client cannot parse or render messages fast enough A production server should have policies for slow consumers:\nlimit message size limit queued outgoing bytes drop non-critical updates disconnect clients that fall too far behind compress only when it actually helps Ignoring backpressure is how \u0026ldquo;real-time\u0026rdquo; systems turn into memory leaks under load.\nSecurity Considerations WebSockets need the same security discipline as HTTP APIs, plus a few extra checks.\nImportant items include:\nuse wss:// in production authenticate the handshake authorize every meaningful action validate all incoming messages enforce message size limits rate-limit noisy clients check the Origin header for browser clients avoid leaking secrets in query strings close idle or abusive connections The Origin check is especially easy to miss.\nBrowsers include an Origin header in WebSocket handshakes. If the server accepts cookie-based authentication, checking the origin helps reduce cross-site WebSocket hijacking risks.\nObservability WebSockets can be harder to debug than normal APIs because there is not a clean request-response record for every interaction.\nWe should track:\nactive connection count connection open and close rate close codes and reasons messages sent and received by type message processing latency authentication failures reconnect rate outgoing queue size dropped messages Logs should include connection IDs and user IDs where safe. Metrics should make it obvious when a deploy, dependency outage, or network issue caused reconnect storms.\nFor important user actions, we should still store durable events or use HTTP APIs where appropriate. A WebSocket message that only exists in memory is easy to lose.\nA Practical Architecture A common production architecture looks like this:\nBrowser | | wss:// v Load Balancer | v WebSocket Gateway | +--\u0026gt; Auth / Session Service | +--\u0026gt; Redis / NATS / Kafka / Pub/Sub | +--\u0026gt; Application Services The WebSocket gateway owns connection state:\nwhich user is connected which rooms or topics the user subscribed to what messages should be sent to the client when to close unhealthy connections The rest of the system can publish events without knowing which server currently holds the user\u0026rsquo;s socket.\nThis separation keeps business services from becoming tightly coupled to WebSocket connection management.\nWhen We Should Not Use WebSockets We should avoid WebSockets when:\nupdates are infrequent one-way server-to-client streaming is enough simple polling gives an acceptable user experience the team does not need low-latency bidirectional communication infrastructure cannot support long-lived connections reliably durable delivery matters more than immediate delivery For example, a billing dashboard that refreshes every minute does not need WebSockets. Polling is simpler and easier to operate.\nA live log viewer may be better with Server-Sent Events if the browser only receives data and does not need to send much back.\nThe best architecture is not the one with the most real-time technology. It is the one that matches the product requirement with the least operational complexity.\nKey Takeaways WebSockets are useful because they turn the browser-server relationship from repeated request-response calls into a long-lived two-way channel.\nThe core ideas are:\nWebSockets start with an HTTP upgrade handshake. After the 101 Switching Protocols response, the connection uses WebSocket frames. Either side can send messages while the connection stays open. WebSockets are best for low-latency bidirectional features. Production systems need authentication, authorization, heartbeats, reconnects, backpressure handling, and observability. Scaling WebSockets usually requires a shared messaging layer between server instances. The practical default is simple: start with polling or Server-Sent Events if they satisfy the requirement. Use WebSockets when the product genuinely needs interactive, bidirectional, low-latency communication.\nThat is where WebSockets are worth the extra operational work.\n","permalink":"https://learncodecamp.net/websockets-explained/","summary":"\u003cp\u003eMost web applications start with a simple request-response model.\u003c/p\u003e\n\u003cp\u003eThe browser asks for something, the server responds, and the connection is done.\u003c/p\u003e\n\u003cp\u003eThat model works well for pages, APIs, forms, dashboards, and most CRUD applications. But some features need something different:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003echat messages that appear instantly\u003c/li\u003e\n\u003cli\u003elive sports scores\u003c/li\u003e\n\u003cli\u003emultiplayer game state\u003c/li\u003e\n\u003cli\u003ecollaborative editing cursors\u003c/li\u003e\n\u003cli\u003etrading prices\u003c/li\u003e\n\u003cli\u003edelivery tracking\u003c/li\u003e\n\u003cli\u003ereal-time notifications\u003c/li\u003e\n\u003cli\u003eterminal sessions in the browser\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eFor these features, repeatedly asking the server \u0026ldquo;anything new?\u0026rdquo; becomes wasteful and slow.\u003c/p\u003e","title":"WebSockets Explained: How Real-Time Communication Works on the Web"},{"content":"When a production system fails, the hardest part is often not the fix. The hardest part is knowing where to look.\nThat is the real value of observability. A service without observability feels like a black box. Requests go in, responses come out, and when something breaks we start guessing. With useful telemetry, that black box becomes closer to a glass box: we can see request paths, slow dependencies, errors, queueing, retries, model latency, token usage, and the exact step where a workflow fell apart.\nOpenTelemetry, usually shortened to OTel, is the standard we should use when we want that visibility without wiring our application permanently to one vendor.\nIn this post, we will cover:\nwhat OTel actually is why it matters for TypeScript and Node.js services how we can add OTel to a TypeScript application how Langfuse uses OpenTelemetry for LLM observability what to watch out for before using this in production What Is OpenTelemetry? OpenTelemetry is a vendor-neutral observability framework for generating, collecting, processing, and exporting telemetry data.\nThe important words are:\ngenerating: your application emits telemetry collecting: telemetry is gathered from application code, libraries, runtimes, or infrastructure processing: telemetry can be batched, filtered, sampled, enriched, or transformed exporting: telemetry is sent to a backend such as Jaeger, Prometheus, Grafana, Datadog, Honeycomb, New Relic, Langfuse, or another system OTel is not the database. It is not the dashboard. It is not the full observability product.\nIt is the standard instrumentation and telemetry pipeline between your application and the tool where you store, query, and visualize the data.\nLet\u0026rsquo;s understand this with USB-C technology. Before a common standard, every device vendor had its own cable and port. Switching devices meant buying new accessories and learning new behaviors. OTel tries to solve the same kind of problem in observability. Instead of rewriting instrumentation every time you switch vendors, your application emits telemetry in a common shape.\nThat matters because instrumentation is expensive. If we add traces, attributes, metrics, and log correlation across dozens of services, we do not want to throw that work away just because the company changes its observability backend later.\nThe Three Main Telemetry Signals OTel works with three familiar signals: traces, metrics, and logs.\nTraces A trace shows the path of one request or workflow through the system.\nFor example, a request to answer a user question might look like this:\nPOST /chat -\u0026gt; authenticate user -\u0026gt; retrieve documents -\u0026gt; call embedding model -\u0026gt; call vector database -\u0026gt; call LLM -\u0026gt; stream answer Each step is usually represented as a span. A span has a name, start time, end time, attributes, events, status, and parent-child relationship.\nFor backend services, traces answer questions like:\nWhich dependency made this request slow? Did the error happen in our service, the database, the queue, or an external API? How much time did we spend in retrieval before calling the model? Which user-facing route is creating the most expensive LLM calls? Metrics Metrics are measurements captured over time.\nCommon examples:\nrequest count request duration error rate queue depth memory usage CPU usage token usage model operation duration Metrics are good for dashboards and alerts because they aggregate well. If we need to know whether p95 latency is rising or error rate crossed a threshold, metrics are the right signal.\nLogs Logs are timestamped records of events.\nLogs are still useful, but they become much more useful when they are correlated with traces. A random log line saying timeout while calling provider is helpful. The same log line with a trace ID, span ID, service name, model name, and request route is much better.\nAs of the current OpenTelemetry JavaScript documentation, traces and metrics are stable in the JS implementation, while logs are still marked as development. That means we should start with traces first in a TypeScript service, then add metrics, and be more careful with logs depending on the maturity of the libraries we are using.\nWhy OTel Matters There are two practical reasons we should care about OTel.\n1. Vendor lock-in gets expensive Many observability vendors have their own agents, SDKs, conventions, exporters, and dashboards. Those tools can be good, but the lock-in becomes painful when the instrumentation is vendor-specific.\nIf every service uses Vendor A\u0026rsquo;s custom SDK and we later move to Vendor B, the migration is not just a configuration change. It can become a code migration across many services.\nOTel changes the shape of that decision.\nThe application emits OpenTelemetry data. Then we can export that data to:\na local collector during development Jaeger for trace debugging Prometheus-compatible systems for metrics a commercial observability backend Langfuse for LLM traces multiple destinations at the same time through the Collector The backend can change without rewriting the whole application instrumentation layer.\n2. Instrumentation becomes shared language Once a team agrees on OTel conventions, services start speaking a common observability language.\nThat means we can standardize names like:\nservice.name deployment.environment http.request.method http.route db.system.name gen_ai.operation.name gen_ai.request.model This common vocabulary is boring in the best way. It makes dashboards, alerts, traces, and cross-service debugging less dependent on tribal knowledge.\nThe Main OTel Pieces Before writing TypeScript, it helps to understand the moving parts.\nAPI The OTel API is what application code can call.\nIn TypeScript, that usually means imports from @opentelemetry/api, such as:\nimport { trace } from \u0026#34;@opentelemetry/api\u0026#34;; The API is intentionally small. Your business code can create spans or add attributes without knowing where the telemetry will eventually go.\nSDK The SDK is the implementation that records and exports telemetry.\nIn Node.js, @opentelemetry/sdk-node is the usual starting point. It wires together the tracer provider, exporters, resource detection, context propagation, and instrumentation libraries.\nThe key rule is simple:\nInitialize the OTel SDK before importing the rest of your application.\nIf the app imports Express, pg, Redis, or HTTP clients before OTel starts, auto-instrumentation may miss hooks.\nInstrumentation Instrumentation is the code that creates telemetry.\nThere are two styles:\nauto-instrumentation: packages create spans for frameworks and libraries automatically manual instrumentation: you create spans around important business logic yourself We can use both.\nAuto-instrumentation gives us the basic HTTP, database, and dependency shape quickly. Manual instrumentation gives us the application-specific spans that actually explain the business workflow.\nExporter An exporter sends telemetry somewhere.\nIn development, a console exporter is enough to prove that spans are being created.\nIn production, we usually want OTLP, the OpenTelemetry Protocol. The TypeScript service can export OTLP directly to a backend, or to an OpenTelemetry Collector.\nCollector The OpenTelemetry Collector is a separate process that receives, processes, and exports telemetry.\nWe should use a Collector once a system grows beyond a toy setup because it lets the application offload telemetry quickly. The Collector can then handle batching, retries, filtering, memory limits, and routing to one or more backends.\nUsing OTel in a TypeScript Node.js Service Here is the minimal shape we can start with.\nFirst install the core packages:\nnpm install @opentelemetry/api \\ @opentelemetry/sdk-node \\ @opentelemetry/auto-instrumentations-node \\ @opentelemetry/sdk-trace-node npm install -D tsx Create instrumentation.ts:\nimport { NodeSDK } from \u0026#34;@opentelemetry/sdk-node\u0026#34;; import { ConsoleSpanExporter } from \u0026#34;@opentelemetry/sdk-trace-node\u0026#34;; import { getNodeAutoInstrumentations } from \u0026#34;@opentelemetry/auto-instrumentations-node\u0026#34;; const sdk = new NodeSDK({ traceExporter: new ConsoleSpanExporter(), instrumentations: [getNodeAutoInstrumentations()], }); sdk.start(); Then run the application with instrumentation loaded first:\nOTEL_SERVICE_NAME=checkout-api \\ npx tsx --import ./instrumentation.ts src/index.ts For Node.js 20 and newer, the official OTel docs show this TypeScript --import pattern with tsx. If the compiled application runs as ESM, check the current OTel ESM loader guidance because module loading order matters.\nWith auto-instrumentation enabled, a framework like Express can start producing request spans without manually wrapping every route.\nAdding Manual Spans Auto-instrumentation tells us that a request hit /chat and called a database. It usually does not tell us the product-level story.\nFor that, we add manual spans around important workflow steps.\nExample:\nimport { SpanStatusCode, trace } from \u0026#34;@opentelemetry/api\u0026#34;; const tracer = trace.getTracer(\u0026#34;support-assistant\u0026#34;); type Answer = { text: string; inputTokens: number; outputTokens: number; }; export async function answerQuestion( question: string, userTier: \u0026#34;free\u0026#34; | \u0026#34;pro\u0026#34;, ): Promise\u0026lt;Answer\u0026gt; { return tracer.startActiveSpan(\u0026#34;rag.answer-question\u0026#34;, async (span) =\u0026gt; { span.setAttributes({ \u0026#34;app.user_tier\u0026#34;: userTier, \u0026#34;app.workflow\u0026#34;: \u0026#34;support_assistant\u0026#34;, }); try { const documents = await retrieveDocuments(question); span.setAttribute(\u0026#34;app.retrieval.document_count\u0026#34;, documents.length); const answer = await callModel(question, documents); span.setAttributes({ \u0026#34;gen_ai.usage.input_tokens\u0026#34;: answer.inputTokens, \u0026#34;gen_ai.usage.output_tokens\u0026#34;: answer.outputTokens, }); return answer; } catch (error) { span.recordException(error as Error); span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : \u0026#34;Unknown error\u0026#34;, }); throw error; } finally { span.end(); } }); } This is where traces become useful.\nWe do not want a trace that only says:\nPOST /chat took 4.2s We want a trace that says:\nPOST /chat took 4.2s -\u0026gt; retrieval took 120ms and returned 6 chunks -\u0026gt; reranking took 80ms -\u0026gt; model call took 3.8s -\u0026gt; output used 920 tokens That is the difference between \u0026ldquo;the request is slow\u0026rdquo; and \u0026ldquo;the model call dominates the request, but retrieval is fine.\u0026rdquo;\nExporting with OTLP Console output is only for development.\nFor a real service, we should export over OTLP:\nnpm install @opentelemetry/exporter-trace-otlp-proto Then change instrumentation.ts:\nimport { NodeSDK } from \u0026#34;@opentelemetry/sdk-node\u0026#34;; import { getNodeAutoInstrumentations } from \u0026#34;@opentelemetry/auto-instrumentations-node\u0026#34;; import { OTLPTraceExporter } from \u0026#34;@opentelemetry/exporter-trace-otlp-proto\u0026#34;; const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter(), instrumentations: [getNodeAutoInstrumentations()], }); sdk.start(); Configure the destination with environment variables:\nexport OTEL_SERVICE_NAME=checkout-api export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 The usual local development pattern is:\nTypeScript service -\u0026gt; OTLP -\u0026gt; OpenTelemetry Collector -\u0026gt; backend The production pattern is similar, but the Collector usually runs as a sidecar, daemon, gateway, or managed service.\nWhat We Watch For in Production Adding OTel is not just \u0026ldquo;install a package and call it done.\u0026rdquo; A few details matter.\nStart instrumentation before app code This is the most common Node.js mistake.\nThe instrumentation file must run before the app imports the libraries you want to instrument.\nSet a useful service name Always set OTEL_SERVICE_NAME.\nWithout it, traces from different services become painful to separate.\nAvoid high-cardinality chaos Attributes like user.id, order.id, request.id, and full URLs can create too many unique values if they end up in metrics or aggregation keys.\nUse them deliberately. They can be useful on traces, but dangerous in metrics.\nDo not leak secrets or PII Telemetry often outlives the request.\nDo not casually record:\naccess tokens API keys raw authorization headers passwords full prompts with private user data full model outputs containing sensitive content For LLM systems especially, prompt and response capture needs a policy. The OpenTelemetry GenAI semantic conventions explicitly treat model inputs, outputs, and system instructions as sensitive or potentially large data.\nUse the Collector when the system grows Direct export is fine to start.\nFor production, a Collector gives you a better place to batch, retry, redact, filter, and route telemetry. It also keeps observability backend changes away from application deployment as much as possible.\nWhere Langfuse Fits Langfuse is an observability platform for LLM applications.\nTraditional observability tells us:\nwhich endpoint is slow which dependency failed how often errors happen what the request path looked like LLM observability needs those things, but it also needs more domain-specific data:\nwhich model was called what prompt or messages were used what the model returned token usage cost latency per generation retrieval steps tool calls sessions and users evaluation scores prompt versions That is where Langfuse is useful. It treats an LLM application workflow as a trace made of observations: normal spans, model generations, tool calls, retrieval steps, and events.\nHow Langfuse Uses OpenTelemetry The current Langfuse SDK docs are explicit: the Langfuse SDKs are built on top of OpenTelemetry.\nThat gives Langfuse a few useful properties:\nnested spans stay connected through OTel context propagation third-party OTel instrumentation can appear inside Langfuse traces trace attributes like user, session, metadata, version, and tags can be propagated OTel spans can be mapped into Langfuse observations GenAI conventions can describe model calls in a more standard way The mapping is roughly:\nOpenTelemetry concept Langfuse concept OTel trace Langfuse trace OTel span Langfuse observation OTel span for LLM call Langfuse generation OTel attributes Langfuse metadata, model data, usage, cost, tags, user/session fields This is a good design choice. Langfuse does not need to invent a completely separate tracing universe. It can build the LLM-specific experience on top of the broader telemetry ecosystem.\nLangfuse in TypeScript For a TypeScript application, install the Langfuse tracing packages:\nnpm install @langfuse/tracing @langfuse/otel @opentelemetry/sdk-node Set credentials:\nexport LANGFUSE_SECRET_KEY=sk-lf-... export LANGFUSE_PUBLIC_KEY=pk-lf-... export LANGFUSE_BASE_URL=https://cloud.langfuse.com Create instrumentation.ts:\nimport { NodeSDK } from \u0026#34;@opentelemetry/sdk-node\u0026#34;; import { LangfuseSpanProcessor } from \u0026#34;@langfuse/otel\u0026#34;; export const sdk = new NodeSDK({ spanProcessors: [new LangfuseSpanProcessor()], }); sdk.start(); Then create observations in application code:\nimport { sdk } from \u0026#34;./instrumentation\u0026#34;; import { propagateAttributes, startActiveObservation, startObservation, } from \u0026#34;@langfuse/tracing\u0026#34;; async function answerWithRag( userId: string, sessionId: string, question: string, ) { await startActiveObservation(\u0026#34;rag-answer\u0026#34;, async (root) =\u0026gt; { root.update({ input: { question } }); await propagateAttributes( { userId, sessionId, metadata: { feature: \u0026#34;support_chat\u0026#34; }, traceName: \u0026#34;rag-answer\u0026#34;, }, async () =\u0026gt; { const retrieval = startObservation(\u0026#34;retrieve-documents\u0026#34;, { input: { question }, }); const documents = await retrieveDocuments(question); retrieval.update({ output: { documentCount: documents.length }, }).end(); const generation = startObservation( \u0026#34;llm-call\u0026#34;, { model: \u0026#34;gpt-4o-mini\u0026#34;, input: [{ role: \u0026#34;user\u0026#34;, content: question }], }, { asType: \u0026#34;generation\u0026#34; }, ); const answer = await callModel(question, documents); generation.update({ output: { content: answer.text }, usageDetails: { input: answer.inputTokens, output: answer.outputTokens, }, }).end(); root.update({ output: { answer: answer.text } }); }, ); }); } process.on(\u0026#34;SIGTERM\u0026#34;, () =\u0026gt; { sdk.shutdown().finally(() =\u0026gt; process.exit(0)); }); For short-lived scripts, call sdk.shutdown() before exit so buffered spans are flushed.\nIf we also want ordinary backend spans in the same trace tree, we can add OTel auto-instrumentation too:\nnpm install @opentelemetry/auto-instrumentations-node import { NodeSDK } from \u0026#34;@opentelemetry/sdk-node\u0026#34;; import { getNodeAutoInstrumentations } from \u0026#34;@opentelemetry/auto-instrumentations-node\u0026#34;; import { LangfuseSpanProcessor } from \u0026#34;@langfuse/otel\u0026#34;; export const sdk = new NodeSDK({ spanProcessors: [new LangfuseSpanProcessor()], instrumentations: [getNodeAutoInstrumentations()], }); sdk.start(); Now a trace can contain both:\nordinary service spans such as HTTP, database, Redis, and fetch calls Langfuse-specific observations such as generations, tool calls, retrieval steps, user/session metadata, token usage, and model outputs Isolating Langfuse From the Global OTel Provider The NodeSDK setup is the right default for most TypeScript applications.\nBut there is another useful pattern: use a separate tracer provider just for Langfuse.\nThis matters when the application already has an OpenTelemetry setup, or when another observability tool owns the global OTel provider. In that case, we may not want Langfuse to become part of the app-wide tracing pipeline. We may only want Langfuse to receive the LLM-specific spans that we create through the Langfuse SDK.\nThe public Langfuse docs call this an isolated TracerProvider setup.\nInstall the lower-level trace SDK:\nnpm install @opentelemetry/sdk-trace-node Then configure Langfuse with its own provider:\nimport { NodeTracerProvider } from \u0026#34;@opentelemetry/sdk-trace-node\u0026#34;; import { LangfuseSpanProcessor } from \u0026#34;@langfuse/otel\u0026#34;; import { setLangfuseTracerProvider } from \u0026#34;@langfuse/tracing\u0026#34;; const langfuseSpanProcessor = new LangfuseSpanProcessor(); const langfuseTracerProvider = new NodeTracerProvider({ spanProcessors: [langfuseSpanProcessor], }); setLangfuseTracerProvider(langfuseTracerProvider); The important difference is that we do not register this provider as the global OpenTelemetry provider.\nThat gives us isolation:\nLangfuse spans go to Langfuse unrelated auto-instrumented service spans do not automatically go to Langfuse another observability backend can keep using the global OTel setup sampling, filtering, and export behavior can be different for LLM traces This pattern is useful when we want Langfuse to focus on LLM workflows instead of becoming the sink for every HTTP, database, Redis, or framework span in the process.\nThere is a tradeoff. Isolated tracer providers still share OTel context, so mixed trace trees can become incomplete if spans from different providers parent each other. If the goal is one complete end-to-end trace tree across the whole service, use NodeSDK. If the goal is a clean Langfuse-only LLM trace pipeline, an isolated provider can be a better fit.\nSending Existing OTel Traces to Langfuse There are two ways to think about Langfuse integration.\nThe first path is the Langfuse TypeScript SDK:\nTypeScript app -\u0026gt; @langfuse/tracing -\u0026gt; OTel SDK -\u0026gt; LangfuseSpanProcessor -\u0026gt; Langfuse The second path is raw OpenTelemetry export:\nAny OTel app -\u0026gt; OTLP HTTP -\u0026gt; Langfuse OTel endpoint The direct OTLP path is useful if:\nthe app is not written in Python or TypeScript the app already has OTel instrumentation the team uses OpenLLMetry, OpenLIT, or another GenAI instrumentation library traces are routed through the OpenTelemetry Collector Langfuse documents this endpoint:\nAUTH_STRING=$(echo -n \u0026#34;$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY\u0026#34; | base64) export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://cloud.langfuse.com/api/public/otel/v1/traces export OTEL_EXPORTER_OTLP_HEADERS=\u0026#34;Authorization=Basic ${AUTH_STRING},x-langfuse-ingestion-version=4\u0026#34; One important detail: Langfuse currently supports OTLP over HTTP with HTTP/JSON and HTTP/protobuf; gRPC export is not supported yet.\nIf the system already uses an OpenTelemetry Collector, we can usually export from services to the Collector, then have the Collector forward selected traces to Langfuse.\nThat shape gives more control:\nservices -\u0026gt; Collector -\u0026gt; normal observability backend -\u0026gt; Langfuse for LLM traces Be careful with filtering. Langfuse needs enough of the trace to build the correct trace tree, including the root span.\nOTel GenAI Semantic Conventions OpenTelemetry now has semantic conventions for generative AI systems, including model spans, agent spans, events, and metrics.\nSome useful attributes include:\ngen_ai.operation.name gen_ai.provider.name gen_ai.request.model gen_ai.response.model gen_ai.usage.input_tokens gen_ai.usage.output_tokens This is exactly the direction LLM observability should move in. Model calls should not be mysterious blobs inside a generic span. They should expose model, provider, operation type, latency, token usage, error type, and enough metadata to debug behavior.\nBut we should still treat the GenAI conventions carefully because the official spec marks them as development. That does not mean \u0026ldquo;do not use them.\u0026rdquo; It means avoid building brittle assumptions around every attribute name until the conventions settle.\nThe privacy point matters even more. Inputs, outputs, system instructions, and tool definitions can contain sensitive data and can be large. In production, we should make capture explicit, filtered, and configurable.\nHow We Can Design This in a Real TypeScript LLM App For a serious TypeScript LLM application, we can separate the layers like this:\nOpenTelemetry baseline\nUse the OTel Node SDK, set OTEL_SERVICE_NAME, enable auto-instrumentation, and export to the Collector.\nBusiness workflow spans\nAdd manual spans around important steps: authentication, retrieval, reranking, tool execution, model calls, output parsing, and safety checks.\nLangfuse for LLM-specific traces\nUse @langfuse/tracing for generations, tool calls, prompt versions, sessions, users, costs, and evaluations.\nCollector routing\nSend general traces and metrics to the main observability backend. Send LLM traces to Langfuse. Keep redaction and filtering policies close to the Collector or SDK configuration.\nPrivacy controls\nDecide what prompt and response data can be stored. Do not let developers accidentally log private user content because it made debugging easier during development.\nThe end state should let us answer both kinds of questions:\nSystem question: Why is /chat p95 latency worse after the deploy? LLM question: Which model call, prompt version, retrieval step, or tool call caused this bad answer? That is why OTel and Langfuse fit together well. OTel gives the standard telemetry substrate. Langfuse gives the LLM-specific interpretation.\nFinal Thought OpenTelemetry is not exciting because it creates another dashboard. It is useful because it gives the application a standard way to describe what happened.\nFor TypeScript services, the practical path is:\ninitialize the OTel SDK before app code use auto-instrumentation for common libraries add manual spans for business workflows export with OTLP use a Collector as the system grows keep sensitive data out of telemetry by default For LLM apps, Langfuse builds on the same foundation. It turns OTel spans into LLM-aware traces: generations, observations, sessions, users, token usage, cost, tool calls, and evaluation context.\nThat is what makes this combination useful. OTel keeps the telemetry standard. Langfuse makes the LLM workflow understandable.\nSources Checked OpenTelemetry: What is OpenTelemetry? OpenTelemetry JavaScript docs OpenTelemetry Node.js getting started OpenTelemetry JavaScript exporters OpenTelemetry Collector docs OpenTelemetry GenAI semantic conventions Langfuse SDK overview Langfuse instrumentation docs Langfuse OpenTelemetry integration ","permalink":"https://learncodecamp.net/opentelemetry-otel-typescript-langfuse/","summary":"\u003cp\u003eWhen a production system fails, the hardest part is often not the fix. The hardest part is knowing \u003cstrong\u003ewhere to look\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eThat is the real value of observability. A service without observability feels like a black box. Requests go in, responses come out, and when something breaks we start guessing. With useful telemetry, that black box becomes closer to a glass box: we can see request paths, slow dependencies, errors, queueing, retries, model latency, token usage, and the exact step where a workflow fell apart.\u003c/p\u003e","title":"OpenTelemetry (OTel) in TypeScript: How It Works and How Langfuse Uses It"},{"content":"When I evaluate an LLM system, one of the first latency metrics I look at is TTFT, or time to first token.\nThis metric answers a simple question:\nAfter a user sends a request, how long does it take before the first output token appears?\nThat sounds narrow, but it matters a lot. Users usually forgive a response that streams steadily after it starts. What feels bad is the dead time before anything appears on screen.\nIn this post, I explain what TTFT really measures, how it differs from throughput metrics like tokens per second, what usually makes TTFT worse, and what teams actually do to improve it.\nIf you want the broader background on prefill, decode, and KV cache behavior, I already covered that in Understanding LLM Inference Basics: Prefill and Decode, TTFT, and ITL.\nWhat Is TTFT? TTFT (time to first token) is the elapsed time between:\nthe moment a request reaches the inference system the moment the first generated token is returned to the user In a streamed chat response, TTFT is the delay before the model starts \u0026ldquo;speaking\u0026rdquo;.\nFor autoregressive LLMs, TTFT is dominated by two things:\nprocessing the input prompt generating the first output token That is why TTFT is closely tied to the prefill phase of inference.\nTTFT vs TPS vs ITL These terms get mixed together all the time, but they are not the same.\nMetric What it tells me Why it matters TTFT How long the user waits before seeing the first token Perceived responsiveness at the start TPS How many tokens per second are produced after generation begins Streaming speed ITL The gap between consecutive generated tokens Smoothness of output once streaming starts A useful way to think about it is:\nTTFT is startup latency TPS is output rate ITL is the inverse view of output rate per user If ITL is 20 ms, the stream is effectively producing about 50 tokens per second for that user.\nOne reason people get confused is that TPS is sometimes used as a latency metric per user and sometimes as a throughput metric for the whole service. I prefer being explicit:\nPer-user TPS for how fast one response streams Total throughput for how many tokens the service generates overall Why TTFT Matters So Much TTFT has a disproportionate effect on how fast a system feels.\nConsider these two cases:\nSystem A starts in 200 ms and then streams a little slower System B starts in 2 seconds and then streams very fast Most users will say System A feels better, especially in chat, coding assistants, search copilots, and voice or agent interfaces.\nThat is because the first visible token is a trust signal. It tells the user the system is alive, the request is accepted, and the response is on its way.\nIn practical terms, TTFT matters most for:\nchatbots coding assistants agent UIs with streaming text search copilots interactive internal tools If the system is returning a tool result, JSON blob, or a fully buffered response, total response time may matter more than TTFT. But for user-facing streamed text, TTFT is usually one of the most important latency metrics.\nWhy TTFT Is Mostly a Prefill Problem In most modern LLM systems, the first token cannot be produced until the model has processed the entire prompt.\nThat means the system must:\ntokenize the input run the prompt through all transformer layers build the KV cache perform the first decode step This front-loaded work is why TTFT grows with prompt length.\nA short prompt usually gives lower TTFT.\nA long prompt, large system prompt, large retrieved context window, or oversized chat history usually pushes TTFT up.\nThis is also why TTFT is often described as being compute-bound. During prefill, the GPU is doing large matrix multiplications across the full prompt. The work is highly parallel, but there is still a lot of it.\nWhat Increases TTFT? These are the most common causes I see:\n1. Long prompts This is the biggest one.\nEvery extra token in the input has to be processed before the first output token can be emitted. Long RAG context, repeated conversation history, and bloated system prompts all hurt TTFT.\n2. Larger models Bigger models do more work per layer and usually have higher TTFT on the same hardware.\n3. Queueing and contention Even if raw model execution is fast, the request may sit in a scheduler queue behind other work. In production, TTFT often includes this waiting time too.\n4. Cold starts If the model weights are not already warm in memory, or the runtime has to spin up workers, TTFT can spike badly.\n5. Inefficient prompt construction Some systems pass far more context than they need. I have seen teams spend weeks optimizing model serving while leaving prompt bloat untouched.\n6. Slow tokenization or preprocessing It is usually not the main bottleneck, but in some stacks preprocessing, template rendering, retrieval joins, guardrails, or request routing add noticeable time before inference even starts.\nWhat Does \u0026ldquo;Good\u0026rdquo; TTFT Look Like? There is no single universal number because TTFT depends on:\nmodel size prompt length hardware batching policy whether the request is cold or warm how much orchestration happens before inference Still, the intuition is straightforward:\nlower TTFT feels better stable TTFT is often more important than one perfect benchmark number For interactive systems, the user notices startup delay immediately. A system with great average TTFT but terrible tail latency still feels unreliable.\nThat is why I would track at least:\np50 TTFT p95 TTFT p99 TTFT Average TTFT alone hides too much.\nHow To Improve TTFT If I needed to reduce TTFT, I would look at these levers first.\n1. Cut prompt length aggressively This is usually the highest-leverage fix.\ntrim old chat history summarize earlier turns shrink retrieved chunks reduce repeated instructions avoid stuffing context \u0026ldquo;just in case\u0026rdquo; Many systems have a prompt design problem disguised as an inference problem.\n2. Use prompt caching where it actually helps If a large prefix is reused across requests, caching that prefix can reduce repeated prefill work.\nThis is especially useful when the system prompt, tool schema block, or shared context stays stable across many requests.\n3. Choose the right model size If the use case does not need the largest model, a smaller model can reduce TTFT significantly and often improve the total product experience.\n4. Reduce cold starts Keep workers warm when possible. If the runtime repeatedly unloads and reloads model state, TTFT becomes unpredictable.\n5. Tune batching carefully Batching can improve hardware efficiency, but aggressive batching can also increase waiting time before a request starts running. That tradeoff is good for throughput, but sometimes bad for perceived latency.\n6. Simplify pre-inference orchestration If your stack does retrieval, reranking, safety checks, routing, prompt templating, and tracing before the model sees the request, TTFT can suffer even when the model runtime is fine.\nMeasure the whole path, not just the GPU kernel time.\nA Simple Mental Model I think of LLM latency like this:\nTTFT tells me how long it takes to get started ITL tells me how smoothly the response continues Total response time tells me when the full job is done Different products care about these differently.\nFor example:\na chatbot cares a lot about TTFT and ITL an offline batch summarization job cares more about total throughput an agent waiting for a tool call may care more about full completion time than token-by-token streaming That is why one latency metric never tells the whole story.\nTTFT and User Experience The important part is not just model performance. It is perceived performance.\nA user cannot see FLOPs, KV cache efficiency, or scheduler behavior. They only see:\nDid the answer start quickly? Did it keep moving? Did it finish in a reasonable time? TTFT directly affects the first of those.\nThat is why it is such a useful metric for real systems. It maps technical behavior to an obvious human experience.\nSummary TTFT in LLMs measures how long it takes for the first generated token to appear after a request is sent.\nIt is mostly driven by prompt processing and the first decode step, which makes it strongly influenced by prompt length, model size, queueing, and cold-start behavior.\nIf the goal is to reduce user-visible latency in a streaming LLM product, TTFT is one of the first metrics worth checking. A fast-starting system usually feels much better than one that stays silent for too long, even if both eventually generate at similar speeds.\n","permalink":"https://learncodecamp.net/ttft-in-llms-explained/","summary":"\u003cp\u003eWhen I evaluate an LLM system, one of the first latency metrics I look at is \u003cstrong\u003eTTFT\u003c/strong\u003e, or \u003cstrong\u003etime to first token\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eThis metric answers a simple question:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eAfter a user sends a request, how long does it take before the first output token appears?\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat sounds narrow, but it matters a lot. Users usually forgive a response that streams steadily after it starts. What feels bad is the dead time before anything appears on screen.\u003c/p\u003e","title":"TTFT in LLMs Explained: What Time to First Token Really Measures"},{"content":"When I use Claude Code, I am not just using a model that generates text. I am using a tool-driven coding environment that can inspect files, search code, edit content, run shell commands, and delegate work to subagents.\nThat tool layer is the real reason Claude Code feels different to me from a normal chat UI.\nInstead of asking:\nCan the model explain my code?\nI can ask:\nCan the model inspect the repo, find the bug, patch the file, and run the command needed to verify the fix?\nThat shift is what makes Claude Code genuinely useful for engineering work.\nIn this post, I am breaking down the main Claude Code tools and when I would actually use each one:\nAgent Bash Edit Glob Grep Read Skill ToolSearch Write The Big Picture These tools are not all doing the same job.\nSome are for reading context:\nRead Glob Grep Some are for changing files:\nEdit Write Some are for taking actions:\nBash Agent Skill And one exists to unlock more tools on demand:\nToolSearch That separation matters because a good coding agent should not treat every task like a shell command.\nFor example:\nIf you need to find files by pattern, Glob is better than raw shell search If you need to search inside code, Grep is better than calling grep from a terminal If you need to patch one part of a file, Edit is safer than rewriting the whole thing In my experience, Claude Code works best when it uses the most specific tool for the job, not just Bash for everything.\nQuick Summary Table Tool Main purpose Best used for Agent Delegate work to a subagent Multi-step research, planning, exploration, specialized tasks Bash Run shell commands Builds, tests, git commands, CLI workflows Edit Make targeted changes Replacing or updating exact text in an existing file Glob Find files by pattern Matching paths like src/**/*.ts or content/posts/*.md Grep Search file contents Finding strings, regex matches, and code patterns Read Read a file directly Inspecting source files, Markdown, images, PDFs, notebooks Skill Invoke higher-level workflows Slash-command-like capabilities and reusable workflows ToolSearch Load deferred tool schemas Making additional tools callable when only their names are known Write Create or replace a file New files or full rewrites 1. Agent The Agent tool is what makes Claude Code feel more like a small operating system than a chatbot.\nIt can launch a subagent to handle a specific class of task. The tool description you shared includes several examples:\ngeneral-purpose Explore Plan statusline-setup claude-code-guide This matters because not every task should be solved in the main thread.\nIf the model needs to:\nexplore a large codebase create an implementation plan research Claude Code documentation handle a longer multi-step task autonomously then an agent can take that workload and return a distilled result.\nWhen Agent is a good fit Open-ended repo exploration Architecture planning Research tasks that may require multiple passes Specialized workflows with a dedicated subagent type When Agent is not a good fit The tool description is explicit here: do not use it for tiny, direct actions like:\nreading one known file looking for a specific class in a known location searching 2 or 3 files where Read or Grep is faster That is a good design choice. Delegation is powerful, but it has overhead.\n2. Bash Bash is the tool that lets Claude Code interact with the shell.\nThis is the bridge to the normal developer workflow:\nrunning tests building the project checking git status installing dependencies using CLI tools like gh, npm, pytest, or hugo But the most interesting part of the tool description is what it warns against.\nIt explicitly says Claude Code should avoid using shell commands for tasks that already have better dedicated tools. For example:\nuse Glob, not find use Grep, not grep use Read, not cat use Edit, not sed use Write, not shell redirection That is not just a style preference. It makes the agent more structured, easier to review, and less likely to do sloppy file operations.\nBest use cases for Bash Run test suites Start dev servers Use git intentionally Execute project-specific CLIs Validate that changes actually work Where people misuse it A weak agent uses Bash as a hammer for everything.\nA stronger agent treats Bash as the tool for execution, not for every file lookup or text replacement.\n3. Edit Edit is for precise modifications inside an existing file.\nThis is the tool you want when the file already exists and the change is local:\nreplace a function call update a paragraph rename a setting patch a small code block Instead of rewriting the entire file, Edit performs an exact string replacement.\nThat makes it safer for surgical changes, especially in large files.\nThe most important constraint is this:\nClaude Code must read the file first That is a sensible guardrail. Editing blind is how agents corrupt files.\n4. Glob Glob is the file discovery tool.\nUse it when the question is:\nWhich files match this pattern? Where are the Markdown posts? Do we have any *.tsx files under src/components/? Typical examples look like:\ncontent/posts/*.md src/**/*.ts layouts/**/*.html This is much cleaner than throwing a broad shell search at the repository.\nIf Read answers “what is inside this file?”, then Glob answers “which files should I inspect?”\n5. Grep Grep is the content search tool, and in Claude Code it is built on top of ripgrep.\nThis is one of the most important tools in any coding agent because code work usually starts with:\nWhere is this function used? Which files mention this config key? Where are API endpoints defined? What code paths touch this component? That is exactly what Grep is for.\nOne detail worth calling out from the tool definition is that Claude Code is supposed to use Grep for search tasks and not call grep or rg through Bash. The reason is simple: the dedicated tool is optimized for the agent environment, gives cleaner structured output, and is easier to control than a raw shell command.\nIt supports:\nregex file globs file types line-numbered output context lines count mode multiline search So it is not just “find a word.” It is a structured way to ask questions about the codebase.\nHow this is different from normal grep At first glance, Grep looks like it is just the shell command grep with a capital G, but that is not really what is going on.\nHere is how I think about the difference:\nshell grep is a generic Unix command-line utility Claude Code Grep is an agent tool with a defined schema and structured parameters shell grep prints raw terminal output Claude Code Grep lets the agent ask for modes like matching content, file lists, counts, line numbers, context, file-type filters, glob filters, and even multiline search shell grep is just one command in a terminal session Claude Code Grep is designed to work safely inside the tool system, with predictable output the model can reason over So even though the name is familiar, the important difference is that Claude Code Grep is not just a plain shell invocation. It is a purpose-built search interface for the agent.\nSimple mental model Glob finds files by name Grep finds text inside files Read inspects the actual file contents That trio is the backbone of code exploration.\n6. Read Read is the direct file inspection tool.\nIt can read:\nsource files Markdown files images PDFs Jupyter notebooks That range matters because real software work is not limited to code. Sometimes the agent needs to inspect:\ndocumentation screenshots diagrams exported reports notebook outputs The tool description also encourages reading only the relevant slice of a large file when possible. That is another good pattern: gather the needed context without drowning in noise.\nIf I had to name the most fundamental Claude Code tool, I would probably pick Read.\nWithout good reading, every later action gets worse.\n7. Skill Skill is more like a reusable workflow layer than a raw file or shell tool.\nThe description says users may refer to a slash command such as:\n/commit /review-pr and those map to skills.\nSo a skill is effectively a packaged capability that the agent can invoke inside the main conversation.\nThis is useful because some workflows are common enough to deserve their own interface:\nreviewing a pull request handling PDFs running a commit workflow invoking domain-specific helper routines Instead of rebuilding the logic from scratch every time, Claude Code can call the skill directly.\nThat makes the system more modular.\n8. ToolSearch ToolSearch is a meta-tool.\nIt does not directly read files or modify code. Instead, it retrieves the schema definitions for deferred tools so they become callable.\nThat is a subtle but important idea.\nSometimes the environment only exposes a tool name at first, not the full callable definition. In that situation, the agent needs a way to say:\nLoad the real schema for this tool so I can use it properly.\nThat is what ToolSearch does.\nThis tool matters less in day-to-day coding than Read or Grep, but architecturally it is very interesting. It means the tool system itself can be partially lazy-loaded.\n9. Write Write is for creating a new file or replacing a file completely.\nThat makes it different from Edit.\nUse Write when:\nthe file does not exist yet the whole file needs to be regenerated a complete rewrite is simpler than patching Do not use it for small edits in an existing file unless a full rewrite is genuinely the cleanest option.\nThat distinction matters because full rewrites can accidentally wipe unrelated content, formatting, or hand-edited details.\nGood agents prefer:\nEdit for local changes Write for new or fully replaced files How These Tools Work Together The real power of Claude Code is not any single tool. It is the sequence.\nA typical workflow looks like this:\nUse Glob to find likely files Use Grep to locate the relevant code or text Use Read to inspect the exact context Use Edit or Write to make the change Use Bash to run tests, builds, or validation Use Agent if the task is large enough to delegate That is a much better loop than:\nGuess Run shell commands blindly Rewrite too much Hope it works I think the tool design is quietly teaching a workflow discipline.\nThe Most Important Design Pattern in Claude Code If you step back, there is a clear philosophy in this tool list:\nprefer structured tools over raw shell commands prefer reading before editing prefer targeted edits over full rewrites prefer delegation only when the task is large enough That is exactly the right direction for coding agents.\nThe more an agent can operate through typed, purpose-built tools, the easier it becomes to make it:\nsafer more auditable more predictable less wasteful with context This is the same reason modern software systems often replace vague text interfaces with stricter APIs.\nFinal Thoughts Claude Code is not just “Claude, but in a terminal.”\nWhat makes it useful is the combination of:\nmodel reasoning filesystem access search tools editing tools shell execution agent delegation That is what turns it from a text generator into a coding workflow assistant.\nIf you understand these nine tools, you understand a big part of how Claude Code actually works in practice.\nMy biggest takeaway is simple:\nThe quality of the agent depends not just on the model, but on whether it chooses the right tool at the right time.\n","permalink":"https://learncodecamp.net/claude-code-tools-explained/","summary":"\u003cp\u003eWhen I use \u003cstrong\u003eClaude Code\u003c/strong\u003e, I am not just using a model that generates text. I am using a tool-driven coding environment that can inspect files, search code, edit content, run shell commands, and delegate work to subagents.\u003c/p\u003e\n\u003cp\u003eThat tool layer is the real reason Claude Code feels different to me from a normal chat UI.\u003c/p\u003e\n\u003cp\u003eInstead of asking:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eCan the model explain my code?\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eI can ask:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eCan the model inspect the repo, find the bug, patch the file, and run the command needed to verify the fix?\u003c/code\u003e\u003c/p\u003e","title":"Claude Code Tools Explained: What Each Tool Does and When to Use It"},{"content":"LoRA stands for Low-Rank Adaptation. It is one of the most useful ideas in modern LLM fine-tuning because it changes the question from:\nHow do we update all of the model's weights?\nto:\nHow do we learn a small update that is still expressive enough for the new task?\nThat is the whole trick.\nInstead of fine-tuning every entry of a large weight matrix, LoRA keeps the original pretrained weight frozen and learns a low-rank correction on top of it. This makes training much cheaper in parameters, optimizer state, checkpoint size, and often VRAM.\nLoRA is a PEFT method, short for Parameter-Efficient Fine-Tuning.\nIf you have seen people say things like:\n\u0026ldquo;I trained only adapters\u0026rdquo; \u0026ldquo;I fine-tuned a 7B model on one GPU\u0026rdquo; \u0026ldquo;I shipped one base model with many small task-specific checkpoints\u0026rdquo; there is a good chance they were using LoRA or something very close to it.\nWhy Full Fine-Tuning Gets Expensive Fast Take one linear layer with weight matrix\n$$ W_0 \\in \\mathbb{R}^{d_{out} \\times d_{in}}. $$\nIn a normal forward pass:\n$$ y = W_0 x. $$\nIf we do full fine-tuning, we allow every entry of that matrix to change. So training really learns\n$$ W = W_0 + \\Delta W $$\nand the layer becomes\n$$ y = (W_0 + \\Delta W)x. $$\nThat sounds harmless for one layer, but LLMs contain many huge projection matrices:\nattention projections output projections MLP up and down projections Once the model is large, updating all of them becomes expensive.\nThe main cost is not just storing the pretrained weights. During training you also need memory for:\ngradients optimizer states such as Adam\u0026rsquo;s first and second moments checkpoints for all trainable tensors So full fine-tuning is often overkill if the downstream task only needs a relatively structured change.\nLoRA in One Equation LoRA keeps $W_0$ frozen and parameterizes the update as\n$$ \\Delta W = \\frac{\\alpha}{r}BA $$\nwhere\n$$ A \\in \\mathbb{R}^{r \\times d_{in}}, \\qquad B \\in \\mathbb{R}^{d_{out} \\times r}. $$\nThen the forward pass becomes\n$$ y = W_0 x + \\frac{\\alpha}{r}BAx. $$\nThe factor $\\alpha / r$ is just a scaling term. Different libraries expose it as lora_alpha, and its job is to control the effective size of the adapter update.\nSome sources swap the letters and write the factors in the opposite order. Do not get hung up on the names. The important point is this:\nthe big pretrained matrix stays frozen the learned update is factored through a small intermediate dimension r r is much smaller than d_in or d_out Full fine-tuning learns a dense update with the same shape as the original weight. LoRA replaces that with two much smaller trainable matrices whose product has rank at most r. What \u0026ldquo;Low Rank\u0026rdquo; Actually Means The word rank is doing the heavy lifting here.\nIf a matrix has rank r, that means it can only express r independent directions in a linear-algebra sense. A useful identity is:\n$$ \\operatorname{rank}(BA) \\le \\min(\\operatorname{rank}(B), \\operatorname{rank}(A)) \\le r. $$\nSo the LoRA update cannot be an arbitrary full matrix. It is restricted to a smaller family of updates.\nAnother way to see it is to write the product as a sum of rank-1 outer products:\n$$ BA = \\sum_{i=1}^{r} b_i a_i^T $$\nwhere $b_i$ is the $i$th column of $B$ and $a_i^T$ is the $i$th row of $A$.\nThat means LoRA is really saying:\nInstead of learning one giant free-form update, learn a small number of direction pairs and add them together.\nThis restriction is exactly why LoRA is efficient.\nWhy a Low-Rank Update Can Still Work At first glance, LoRA looks like it should be too restrictive.\nWhy would a tiny rank-8 or rank-16 update be enough for a giant transformer layer?\nThe intuition is that many downstream tasks do not need the model to move in every possible direction in weight space. The useful adaptation often lives in a much smaller subspace.\nThis is closely related to a standard fact from linear algebra: the best rank-r approximation to a matrix, in Frobenius norm, comes from truncated SVD:\n$$ M_r = \\sum_{i=1}^{r} \\sigma_i u_i v_i^T. $$\nIf the singular values $\\sigma_i$ decay quickly, then a small rank already captures most of the matrix\u0026rsquo;s energy.\nLoRA is not literally computing the SVD of the perfect update during training. But it is making the same bet:\nThe task-specific change is low-dimensional enough that a low-rank parameterization can capture most of what matters.\nA low-rank approximation keeps the dominant singular directions and throws away the weaker ones. When the spectrum decays fast, small ranks can still preserve most of the structure. The Parameter-Count Math This is where LoRA becomes obviously attractive.\nFor full fine-tuning of one linear layer, the number of trainable parameters is\n$$ d_{out}d_{in}. $$\nFor LoRA, the trainable parameters are only\n$$ rd_{in} + d_{out}r = r(d_{in} + d_{out}). $$\nSo the trainable fraction is\n$$ \\frac{r(d_{in} + d_{out})}{d_{out}d_{in}}. $$\nIf the layer is square, with $d_{in} = d_{out} = d$, this simplifies to\n$$ \\frac{2r}{d}. $$\nThat simple ratio explains why LoRA becomes dramatic on big models.\nFor a 4096 x 4096 projection:\nSetup Trainable parameters Fraction of full Full fine-tuning 16,777,216 100% LoRA, r = 8 65,536 0.39% LoRA, r = 16 131,072 0.78% LoRA, r = 64 524,288 3.13% So even rank 64 is still training only a small slice of the full matrix.\nFor a single 4096 x 4096 layer, LoRA stays tiny even as rank increases. The full dense update is a horizontal line because it never changes with rank. A Small but Important Nuance About Memory LoRA reduces trainable parameters, but it does not remove the need to hold the base model itself in memory.\nThat distinction matters.\nLoRA saves a lot of optimizer memory because only A and B get optimizer states. LoRA saves checkpoint size because you can store just the adapters. LoRA does not magically make the frozen base weights disappear. If base-model memory is still the bottleneck, that is where QLoRA comes in:\nquantize the frozen base model, often to 4-bit keep LoRA adapters trainable in higher precision So LoRA and QLoRA solve related but not identical problems.\nForward Pass Intuition It helps to break the LoRA forward pass into two smaller steps.\nFirst project the input down into rank r:\n$$ h = Ax \\in \\mathbb{R}^{r}. $$\nThen project it back up:\n$$ u = Bh \\in \\mathbb{R}^{d_{out}}. $$\nNow combine with the frozen layer:\n$$ y = W_0 x + \\frac{\\alpha}{r}u. $$\nThat gives a nice mental model:\nA compresses the useful task signal into a small latent space B expands that signal back into the output dimension the frozen pretrained path still carries the original model behavior In other words, LoRA is not replacing the pretrained model. It is adding a small learned correction on top of it.\nWhat Backpropagation Updates Suppose the loss is $L$ and the layer output is\n$$ y = W_0 x + sBAx $$\nwith $s = \\alpha / r$.\nLet\n$$ g = \\frac{\\partial L}{\\partial y}. $$\nIf we define\n$$ h = Ax, $$\nthen the gradients for the trainable LoRA factors are\n$$ \\frac{\\partial L}{\\partial B} = sgh^T $$\nand\n$$ \\frac{\\partial L}{\\partial A} = sB^Tgx^T. $$\nThe important practical point is not the notation. It is this:\ngradients flow through the LoRA branch A and B get updated W_0 stays frozen That is why optimizer state stays small.\nWhy the Initialization Matters Most LoRA implementations initialize the factors so that the adapter path starts at zero.\nA common choice is:\ninitialize A with small random values initialize B to zeros Then initially\n$$ BA \\approx 0 $$\nand the model behaves exactly like the original pretrained model at step 0.\nThat is a good default because the adapter starts as a no-op and only learns deviations supported by the downstream data.\nA Minimal PyTorch Implementation Here is a stripped-down LoRALinear module. This is not a full library replacement, but it shows the mechanics clearly. In a real fine-tuning run, self.weight would be copied from the pretrained checkpoint and then frozen.\nimport math import torch import torch.nn as nn import torch.nn.functional as F class LoRALinear(nn.Module): def __init__(self, in_features, out_features, r=16, alpha=32, bias=True): super().__init__() self.in_features = in_features self.out_features = out_features self.r = r self.scaling = alpha / r if r \u0026gt; 0 else 0.0 self.weight = nn.Parameter(torch.empty(out_features, in_features)) self.weight.requires_grad = False if bias: self.bias = nn.Parameter(torch.zeros(out_features)) self.bias.requires_grad = False else: self.register_parameter(\u0026#34;bias\u0026#34;, None) if r \u0026gt; 0: self.lora_A = nn.Parameter(torch.randn(r, in_features) * 0.01) self.lora_B = nn.Parameter(torch.zeros(out_features, r)) else: self.register_parameter(\u0026#34;lora_A\u0026#34;, None) self.register_parameter(\u0026#34;lora_B\u0026#34;, None) nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5)) def forward(self, x): base = F.linear(x, self.weight, self.bias) if self.r == 0: return base update = (x @ self.lora_A.t()) @ self.lora_B.t() return base + self.scaling * update And when you fine-tune, you would optimize only the adapter parameters:\nmodel = LoRALinear(4096, 4096, r=16, alpha=32) trainable = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.AdamW(trainable, lr=2e-4) Real-world implementations add more details:\ndropout on the adapter path merge and unmerge logic for inference automatic insertion into specific transformer modules optional bias tuning rules But the mathematical core is still the same BA update.\nA Short Python + Matplotlib Example The parameter-savings chart above can be reproduced with a small script:\nimport numpy as np import matplotlib.pyplot as plt d_in = 4096 d_out = 4096 ranks = np.array([1, 2, 4, 8, 16, 32, 64, 128, 256]) full_params = d_in * d_out lora_params = ranks * (d_in + d_out) fig, ax = plt.subplots(figsize=(9, 5.6), constrained_layout=True) ax.set_yscale(\u0026#34;log\u0026#34;) ax.plot(ranks, lora_params, marker=\u0026#34;o\u0026#34;, linewidth=2.5, label=\u0026#34;LoRA trainable params\u0026#34;) ax.axhline(full_params, linestyle=\u0026#34;--\u0026#34;, linewidth=2, label=\u0026#34;full fine-tuning\u0026#34;) ax.set_xscale(\u0026#34;log\u0026#34;, base=2) ax.set_xticks(ranks) ax.get_xaxis().set_major_formatter(plt.ScalarFormatter()) ax.set_xlabel(\u0026#34;rank r\u0026#34;) ax.set_ylabel(\u0026#34;trainable parameters\u0026#34;) ax.legend() plt.show() And this is the plot the script produces:\nThe same parameter-savings curve generated visually: LoRA stays far below full fine-tuning even as the adapter rank increases. This is one of those cases where a quick plot is more persuasive than a paragraph.\nWhere LoRA Is Applied in a Transformer LoRA is usually attached to selected linear layers, not every parameter in the model.\nCommon choices:\nq_proj and v_proj in self-attention sometimes k_proj and o_proj as well MLP projections in more aggressive setups occasionally embeddings or the output head, depending on the goal This is another reason LoRA is efficient: you can decide where adaptation capacity matters most.\nFor many instruction-tuning setups, adding LoRA to a small subset of projection layers is already enough to get strong results.\nWhat the Hyperparameters Mean Three knobs matter most:\n1. Rank r This controls adapter capacity.\nsmaller r means fewer trainable parameters larger r means more expressive updates If rank is too small, the adapter may underfit. If it is too large, you lose some of the efficiency advantage.\n2. alpha This scales the adapter contribution.\nYou can think of it as controlling how strongly the low-rank branch is allowed to influence the frozen base path.\n3. LoRA dropout Some implementations apply dropout to the adapter input during training. This can help regularize the adapter when data is limited.\nWhy LoRA Is So Convenient Operationally LoRA is not only about memory. It is also operationally neat.\nBecause the base model is frozen:\nthe base checkpoint can be reused across tasks each new task can be stored as a small adapter file multiple adapters can be swapped in and out without copying the whole model That is why LoRA has become such a standard workflow for practical LLM customization.\nCan LoRA Match Full Fine-Tuning? Sometimes yes, sometimes no.\nLoRA is often surprisingly competitive, especially when:\nthe downstream task is close to what the base model already knows the data volume is moderate the target behavior can be expressed with a structured update But LoRA is still a constraint.\nCases where full fine-tuning or a larger adapter may help:\nthe task requires a very large behavioral shift you need to update many more parts of the model the chosen rank is too small the base model itself is a poor starting point So the right question is not:\nIs LoRA always better than full fine-tuning?\nThe right question is:\nIs a low-rank update enough for this task, given the cost savings?\nThe Main Takeaway LoRA works because it separates two roles:\nthe pretrained model stores broad general capability the adapter stores a compact task-specific correction Mathematically, the idea is simple:\n$$ W = W_0 + \\frac{\\alpha}{r}BA $$\nEngineering-wise, that small factorization changes a lot:\nfar fewer trainable parameters smaller optimizer state smaller checkpoints easy adapter sharing and reuse That is why LoRA became one of the default answers to the question:\nHow do I fine-tune a large model without paying the full price of full fine-tuning?\nIf you want one sentence to remember, use this one:\nLoRA freezes the big pretrained matrix and learns a small low-rank update that captures the task-specific change.\n","permalink":"https://learncodecamp.net/lora-finetuning-explained/","summary":"\u003cp\u003e\u003cstrong\u003eLoRA\u003c/strong\u003e stands for \u003cstrong\u003eLow-Rank Adaptation\u003c/strong\u003e. It is one of the most useful ideas in modern LLM fine-tuning because it changes the question from:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eHow do we update all of the model's weights?\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eto:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eHow do we learn a small update that is still expressive enough for the new task?\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eThat is the whole trick.\u003c/p\u003e\n\u003cp\u003eInstead of fine-tuning every entry of a large weight matrix, LoRA keeps the original pretrained weight \u003cstrong\u003efrozen\u003c/strong\u003e and learns a low-rank correction on top of it. This makes training much cheaper in parameters, optimizer state, checkpoint size, and often VRAM.\u003c/p\u003e","title":"LoRA Fine-Tuning Explained: What It Is, Why It Works, and the Math Behind It"},{"content":"The Universal Approximation Theorem (UAT) gets quoted constantly, but it is usually described in a fuzzier way than it deserves.\nIt does not say neural networks are magically good at every task.\nIt does not say a shallow network is the most practical architecture.\nIt does not say gradient descent will easily find the right weights.\nWhat it does say is still important:\nWith a suitable nonlinear activation and enough hidden units, a feedforward network can approximate any continuous function on a bounded domain as closely as we want.\nThat is a statement about expressive power. In other words, the theorem answers this question:\nCan the network represent the function at all, in principle?\nThat is why the result mattered historically. It killed the idea that neural networks are limited to drawing only simple lines or crude thresholds.\nA Common Version of the Theorem There is not just one single UAT statement. The original result by Cybenko (1989) was written for sigmoidal activations, and later results broadened the allowed activations. A beginner-friendly version is:\nLet $K \\subset \\mathbb{R}^n$ be compact, let $f : K \\to \\mathbb{R}$ be continuous, and let $\\sigma$ be a standard nonlinear activation such as a sigmoid. Then for every $\\varepsilon \u0026gt; 0$, there exist an integer $m$ and parameters $a_i \\in \\mathbb{R}$, $w_i \\in \\mathbb{R}^n$, $b_i \\in \\mathbb{R}$, and $c \\in \\mathbb{R}$ such that\n$$ g(x) = \\sum_{i=1}^{m} a_i , \\sigma(w_i^T x + b_i) + c $$\nand\n$$ \\sup_{x \\in K} |f(x) - g(x)| \u0026lt; \\varepsilon. $$\nThat one line is dense, so unpack it:\nCompact means a bounded, closed region such as $[0,1]^n$. Continuous means the target function has no jumps inside that region. Arbitrarily close means we can make the maximum error as small as we like. Existence means the theorem guarantees some parameters exist. It does not tell us how easy they are to find. Activation assumptions matter because formal theorem statements put technical conditions on $\\sigma$. A common modern condition is that $\\sigma$ should not be a polynomial, but for intuition it is enough to think of standard nonlinear choices such as sigmoid or ReLU. A one-hidden-layer network takes the input, transforms it with nonlinear hidden units, and then sums those responses. The theorem says some choice of these parameters can get uniformly close to the target. Why Nonlinearity Is the Whole Game Without a nonlinear activation, the theorem fails immediately.\nSuppose every hidden unit uses the identity activation $\\sigma(z) = z$. In one dimension, even a wide hidden layer collapses to a straight line:\n$$ g(x) = \\sum_{i=1}^{m} a_i (w_i x + b_i) + c = \\left(\\sum_{i=1}^{m} a_i w_i\\right)x + \\left(\\sum_{i=1}^{m} a_i b_i + c\\right). $$\nSo width alone does not help. You just get another straight-line rule.\nThe same collapse happens in higher dimensions. A two-layer network becomes\n$$ g(x) = W_2(W_1 x + b_1) + b_2 = (W_2 W_1)x + (W_2 b_1 + b_2). $$\nThat is still just an affine map. In plain English, it is one overall linear transformation plus a shift.\nSo even if you make the hidden layer extremely wide, a network with only identity activations cannot bend into a complicated curve. It collapses into one overall affine transformation.\nThis is the reason activations such as sigmoid, tanh, and ReLU matter so much. They let the network create bends, steps, corners, and local regions. Width then gives you more of those building blocks.\nThese activations are nonlinear in different ways: sigmoid gives a soft step, tanh gives a smooth bend through zero, and ReLU creates a sharp corner at zero. Intuition: Hidden Units Build Small Local Features The proof ideas behind UAT are mathematical, but the intuition is simpler than the formal statement.\nAll the pictures below are one-dimensional because they are easier to draw. The actual theorem still applies to functions of many variables.\nThink of each hidden unit as producing a small shape:\na soft step a hinge a local bump a region that turns on and off For example, with a sigmoid $\\sigma(z)$, one soft step is\n$$ s(x; t, k) = \\sigma(k(x - t)). $$\nIf $k$ is large, that step becomes sharp near the threshold $t$.\nNow subtract two such steps:\n$$ b(x; u, v, k) = \\sigma(k(x-u)) - \\sigma(k(x-v)). $$\nThat gives a soft bump which is near zero outside the interval $[u,v]$ and higher inside it.\nOnce you have several such bumps, the output layer can add them:\n$$ g(x) = \\sum_{j=1}^{m} c_j , b(x; u_j, v_j, k_j). $$\nThis is not the full theorem, but it is the right mental model. Hidden units create reusable local pieces, and the final layer combines them into a more complicated shape.\nThe construction is the point: first make one soft bump by subtracting two sigmoids, then add many shifted copies to build the final output curve. Here is the same idea plotted directly from the equations with Python and Matplotlib:\nTop: two soft steps, one centered at u and one at v. Bottom: their difference, which creates a single soft bump that is high inside [u, v] and near zero outside. Here $u$ and $v$ decide where the bump turns on and off, while $k$ controls how sharp the two edges are.\nimport numpy as np import matplotlib.pyplot as plt def sigma(z): return 1.0 / (1.0 + np.exp(-z)) def soft_step(x, t, k): return sigma(k * (x - t)) def bump(x, u, v, k): return soft_step(x, u, k) - soft_step(x, v, k) u = -1.0 v = 1.2 k = 4.5 x = np.linspace(-4, 4, 1200) s_u = soft_step(x, u, k) s_v = soft_step(x, v, k) b = bump(x, u, v, k) fig, axes = plt.subplots(2, 1, figsize=(10, 7), sharex=True, constrained_layout=True) axes[0].plot(x, s_u, label=r\u0026#34;$\\sigma(k(x-u))$\u0026#34;) axes[0].plot(x, s_v, label=r\u0026#34;$\\sigma(k(x-v))$\u0026#34;) axes[0].axvline(u, linestyle=\u0026#34;--\u0026#34;) axes[0].axvline(v, linestyle=\u0026#34;--\u0026#34;) axes[0].set_title(\u0026#34;Two soft steps\u0026#34;) axes[0].legend() axes[1].plot(x, b, color=\u0026#34;black\u0026#34;, label=r\u0026#34;$b(x;u,v,k)$\u0026#34;) axes[1].fill_between(x, 0, b, alpha=0.3) axes[1].axvline(u, linestyle=\u0026#34;--\u0026#34;) axes[1].axvline(v, linestyle=\u0026#34;--\u0026#34;) axes[1].set_title(\u0026#34;Their difference gives one soft bump\u0026#34;) axes[1].legend() plt.show() The ReLU Version: Piecewise Linear Approximation Modern networks often use ReLU instead of sigmoid:\n$$ \\operatorname{ReLU}(x) = \\max(0, x). $$\nIn one dimension, sums of shifted ReLUs generate piecewise linear functions:\n$$ g(x) = \\alpha_0 + \\alpha_1 x + \\sum_{j=1}^{m} \\beta_j , \\operatorname{ReLU}(x - t_j). $$\nThis is still a one-hidden-layer ReLU network. It is just written in a form that makes the piecewise-linear structure easier to see.\nEach extra term introduces another place where the slope can change. So as you add more hidden units, the network can place more breakpoints and better follow a curved target.\nThat is why ReLU networks are still universal approximators even though they do not look like the older sigmoid proofs.\nWorked Example on $[0,1]$ Take this target function:\n$$ f(x) = 0.50 + 0.22 \\sin(2\\pi x - 0.35) + 0.08 \\sin(6\\pi x + 0.30), \\qquad x \\in [0,1]. $$\nIt is smooth, continuous, and definitely not just a straight line.\nTo build intuition, imagine approximating it with piecewise linear fits, which is exactly the kind of shape a 1D ReLU network is good at producing.\nA purely linear model can only draw one segment. A small network can place a few bends and catch the rough trend. A wider network can place many bends and push the error down across the whole interval. In the illustration below, the same target curve is approximated at three different capacities. The maximum error drops from about 0.322 to 0.182 to 0.013 as we allow more bends. Because the target curve itself only moves within a fairly small vertical range, 0.322 is visibly poor while 0.013 is already quite close.\nThe theorem is about this direction of travel: more nonlinear building blocks let the approximation error shrink. It does not claim the network is efficient or easy to train, only that the representation exists. What the Theorem Does Not Promise This part is where people often overread the result.\nThe UAT does not guarantee:\nthat one hidden layer is the most parameter-efficient architecture that the network will learn the approximation from finite data that gradient descent will find the right parameters quickly that the required width will be small that every possible function is covered by the classic statement The standard theorem is about approximating continuous functions on compact domains. That is already a big result, but it is not the same thing as saying \u0026ldquo;neural networks can do anything with no tradeoffs.\u0026rdquo;\nWhy Deep Networks Still Matter If a Shallow One Is Universal This is the next natural question.\nIf one hidden layer is universal, why do people build deep networks at all?\nBecause universality is not efficiency.\nA shallow network may need an absurd number of hidden units to represent a function compactly. A deeper network can often reuse intermediate structure and reach the same approximation with far fewer parameters.\nThat is especially important for:\nhierarchical patterns in images compositional structure in language reusable features in speech and time series algorithm-like computations with many stages So the UAT says shallow networks are expressive enough in principle, while modern deep learning is mostly about doing the job efficiently and trainably.\nFinal Takeaway The Universal Approximation Theorem is best understood as a representational guarantee:\nA neural network with a suitable nonlinear activation and enough width can approximate any continuous function on a bounded region to arbitrary accuracy.\nThat is why neural networks are fundamentally more flexible than linear models.\nThe theorem does not solve optimization, data efficiency, or generalization. But it does establish something essential:\nNeural networks are not limited by a lack of expressive power.\nOnce that door is open, the real engineering questions become how to learn the approximation well, how much capacity to use, and which architecture gets there with the least pain.\n","permalink":"https://learncodecamp.net/universal-approximation-theorem/","summary":"\u003cp\u003eThe \u003cstrong\u003eUniversal Approximation Theorem (UAT)\u003c/strong\u003e gets quoted constantly, but it is usually described in a fuzzier way than it deserves.\u003c/p\u003e\n\u003cp\u003eIt does \u003cstrong\u003enot\u003c/strong\u003e say neural networks are magically good at every task.\u003c/p\u003e\n\u003cp\u003eIt does \u003cstrong\u003enot\u003c/strong\u003e say a shallow network is the most practical architecture.\u003c/p\u003e\n\u003cp\u003eIt does \u003cstrong\u003enot\u003c/strong\u003e say gradient descent will easily find the right weights.\u003c/p\u003e\n\u003cp\u003eWhat it \u003cem\u003edoes\u003c/em\u003e say is still important:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eWith a suitable nonlinear activation and enough hidden units, a feedforward network can approximate any continuous function on a bounded domain as closely as we want.\u003c/p\u003e","title":"Universal Approximation Theorem Explained: Why Neural Networks Can Approximate Any Continuous Function"},{"content":"Once you understand neurons, activations, loss functions, and backpropagation, the next thing to understand is the training loop. This is the repetitive engine of deep learning.\nAt a high level, training is boring in the best possible way. It is the same four steps repeated over and over:\nmake a prediction measure the error compute gradients update the weights The interesting part is not the loop itself. The interesting part is how concepts like batch size, epoch, iteration, and convergence affect the behavior of that loop in practice.\nThe Training Loop in One Picture Every iteration of training follows this pattern:\nInput batch -\u0026gt; Forward pass -\u0026gt; Loss -\u0026gt; Backward pass -\u0026gt; Optimizer update -\u0026gt; Repeat Or in code:\nfor batch in data_loader: predictions = model(batch.inputs) loss = loss_fn(predictions, batch.targets) optimizer.zero_grad() loss.backward() optimizer.step() That is the whole training loop in miniature.\nStep 1: Forward Pass The model takes the current batch of inputs and computes predictions using its current weights.\nExamples:\na regression model predicts prices a classifier predicts class logits an LLM predicts next-token logits At this moment, the model is not learning yet. It is just answering:\nGiven my current parameters, what do I predict for these examples?\nStep 2: Compute Loss The loss function measures how wrong those predictions are.\nExamples:\nMSE for regression cross-entropy for classification The output is a scalar value. That scalar becomes the thing training tries to reduce over time.\nStep 3: Backward Pass The backward pass computes gradients of the loss with respect to model parameters.\nThis is where backpropagation happens.\nAfter loss.backward() in PyTorch, each trainable parameter now has a gradient attached to it.\nThose gradients answer:\nHow should this weight change if I want to reduce the loss?\nStep 4: Optimizer Update The optimizer uses the gradients to update weights.\nIn the simplest case:\nw \u0026lt;- w - eta * grad\nWhere eta is the learning rate.\nIn practice, optimizers like Adam keep extra state, but the central idea is the same:\ngradients tell you the direction optimizer decides how to step What Is a Batch? A batch is a subset of training examples processed together before one weight update.\nIf the dataset has 10,000 examples, you usually do not feed all 10,000 at once. Instead, you split them into smaller groups such as:\n16 32 64 128 256 If the batch size is 64, the model processes 64 examples, computes one average loss, computes gradients, and then updates weights once.\nThis is called mini-batch training, and it is the standard way neural networks are trained.\nWhy Not Use the Entire Dataset at Once? You could, in principle, use full-batch gradient descent, but mini-batches are usually better in practice.\nReasons:\nfull-batch training can be too memory-intensive mini-batches are much faster on GPUs mini-batch noise can actually help optimization weight updates happen more frequently So mini-batches are a practical and often optimization-friendly compromise.\nWhat Is an Epoch? An epoch means one complete pass through the entire training dataset.\nExample:\ndataset size = 12,000 batch size = 100 Then:\none epoch contains 120 batches therefore one epoch contains 120 optimizer updates An epoch is about dataset coverage, not about one single update.\nWhat Is an Iteration or Step? An iteration or step usually means:\none batch processed -\u0026gt; one weight update\nSo with:\ndataset size = 12,000 batch size = 100 you get:\n120 iterations per epoch This distinction matters because:\nepochs tell you how many times you have seen the dataset steps tell you how many times you have updated the weights Many modern training setups, especially for large models, are tracked more in steps than in epochs.\nWorked Example: Dataset, Batch Size, Epochs Suppose:\ndataset has 1,000 samples batch size = 100 training runs for 5 epochs Then:\nbatches per epoch = 1,000 / 100 = 10 total iterations = 5 * 10 = 50 total weight updates = 50 This is the simplest arithmetic behind training schedules.\nWhy Batch Size Matters Batch size changes both compute behavior and optimization behavior.\nSmall batches Pros:\nmore frequent updates less memory usage noisier gradients can help escape poor regions Cons:\ntraining can be less stable GPU utilization may be worse Large batches Pros:\nbetter hardware efficiency smoother gradient estimates Cons:\nmore memory required fewer updates per epoch can sometimes generalize worse without proper tuning There is no universal best batch size. It depends on the model, hardware, and objective.\nWhy Training Loss Goes Down but Not Always Smoothly People often expect loss curves to drop smoothly. In reality, mini-batch training introduces noise.\nWhy?\neach batch is only a sample of the dataset different batches have different difficulty the gradient is only an estimate of the full-data gradient So it is normal for batch-level loss to bounce around even while the broader trend is improving.\nThat is not failure. That is just stochastic optimization doing its thing.\nWhat Convergence Means Convergence does not necessarily mean reaching the perfect global minimum.\nIn practical deep learning, convergence usually means something closer to:\ntraining loss is no longer improving much validation performance has stabilized further training gives diminishing returns That is enough to stop in many real workflows.\nToo Small, Too Large, and Just Right Learning Rates The training loop is highly sensitive to learning rate.\nToo small training progresses very slowly loss decreases, but painfully many epochs may be wasted Too large loss may oscillate or diverge updates overshoot good regions training may become numerically unstable Reasonable learning rate loss decreases efficiently training is stable enough to make progress convergence is much faster A lot of \u0026ldquo;model problems\u0026rdquo; are really optimization-setting problems, especially bad learning-rate choices.\nTraining Loss vs Validation Loss To understand convergence properly, you usually track both:\ntraining loss validation loss Good fit both losses go down both stay reasonably close Underfitting both losses stay high model is not learning enough Overfitting training loss keeps dropping validation loss stops improving or starts rising This is why \u0026ldquo;lowest training loss\u0026rdquo; is not always the best checkpoint.\nThe goal is not memorization. The goal is generalization.\nWhy Shuffle the Data Training data is usually shuffled each epoch.\nWhy?\navoids pathological ordering effects makes batches more representative improves stochastic optimization behavior If similar examples are clustered together and never shuffled, training can behave badly or learn biased update patterns.\nWhy We Zero Gradients In PyTorch, gradients accumulate by default.\nThat is why a typical loop includes:\noptimizer.zero_grad() loss.backward() optimizer.step() If you skip zero_grad(), gradients from previous steps will accumulate, which is usually wrong unless you are intentionally doing gradient accumulation.\nGradient Accumulation Sometimes the ideal batch size does not fit in memory.\nOne workaround is gradient accumulation:\nprocess several smaller micro-batches call loss.backward() on each delay optimizer.step() update only after accumulating enough gradients This simulates a larger effective batch size without needing all examples in memory at once.\nA Minimal Training Loop in PyTorch import torch import torch.nn as nn from torch.utils.data import DataLoader, TensorDataset X = torch.randn(1000, 10) y = torch.randint(0, 2, (1000,)) dataset = TensorDataset(X, y) loader = DataLoader(dataset, batch_size=32, shuffle=True) model = nn.Sequential( nn.Linear(10, 32), nn.ReLU(), nn.Linear(32, 2) ) loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) for epoch in range(5): for batch_x, batch_y in loader: logits = model(batch_x) loss = loss_fn(logits, batch_y) optimizer.zero_grad() loss.backward() optimizer.step() print(f\u0026#34;epoch={epoch+1}, loss={loss.item():.4f}\u0026#34;) This example has all the standard pieces:\nmini-batches forward pass loss backward pass optimizer step multiple epochs Common Misunderstandings \u0026ldquo;One epoch means one gradient update\u0026rdquo; No. One epoch may contain hundreds or thousands of updates depending on dataset size and batch size.\n\u0026ldquo;One batch means one sample\u0026rdquo; No. A batch usually contains many samples.\n\u0026ldquo;Convergence means zero loss\u0026rdquo; Not in practice. It usually means the model has stopped improving meaningfully.\n\u0026ldquo;More epochs always means better results\u0026rdquo; No. More epochs can eventually lead to overfitting.\n\u0026ldquo;Batch size is only a hardware choice\u0026rdquo; No. It affects both hardware efficiency and optimization dynamics.\nPractical Mental Model Think of training like repeated course correction:\neach batch gives the model a noisy hint about what it is doing wrong each backward pass converts that hint into gradients each optimizer step nudges the parameters many nudges, accumulated over time, produce learning That is why deep learning is iterative by nature. One step learns almost nothing. A well-tuned sequence of steps learns a lot.\nSummary The training loop repeats: forward pass, loss, backward pass, optimizer update. A batch is a subset of examples processed together before one weight update. An epoch is one full pass through the dataset. An iteration or step is usually one batch processed and one optimizer update. Batch size affects memory, throughput, stability, and generalization. Convergence means training has mostly stopped improving in a meaningful way, not necessarily that loss is zero. Good training practice depends on monitoring both optimization progress and validation behavior. Once these terms are clear, training logs stop looking like random numbers and start looking like a readable story of how the model is learning.\n","permalink":"https://learncodecamp.net/training-loop-batches-epochs-iterations-convergence/","summary":"\u003cp\u003eOnce you understand neurons, activations, loss functions, and backpropagation, the next thing to understand is the \u003cstrong\u003etraining loop\u003c/strong\u003e. This is the repetitive engine of deep learning.\u003c/p\u003e\n\u003cp\u003eAt a high level, training is boring in the best possible way. It is the same four steps repeated over and over:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003emake a prediction\u003c/li\u003e\n\u003cli\u003emeasure the error\u003c/li\u003e\n\u003cli\u003ecompute gradients\u003c/li\u003e\n\u003cli\u003eupdate the weights\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe interesting part is not the loop itself. The interesting part is how concepts like \u003cstrong\u003ebatch size\u003c/strong\u003e, \u003cstrong\u003eepoch\u003c/strong\u003e, \u003cstrong\u003eiteration\u003c/strong\u003e, and \u003cstrong\u003econvergence\u003c/strong\u003e affect the behavior of that loop in practice.\u003c/p\u003e","title":"Training Loop Explained: Batches, Epochs, Iterations, and Convergence"},{"content":"Loss functions answer one basic question:\nHow wrong is the model right now?\nWithout a loss function, a neural network has no way to measure its own mistakes, and without that measurement, gradient-based training has nothing to optimize.\nTwo of the most important losses in machine learning are:\nMean Squared Error (MSE) Cross-Entropy They are both common. They are both differentiable. But they solve different kinds of problems, and using the wrong one makes training harder than it needs to be.\nThe Core Distinction Use MSE when the target is a continuous numeric value.\nExamples:\nhouse price temperature demand forecast stock volatility estimate Use cross-entropy when the target is a class or probability distribution.\nExamples:\ncat vs dog classification spam vs not spam predicting the next token in an LLM multi-class image classification That is the big split:\nregression -\u0026gt; MSE classification -\u0026gt; cross-entropy A quick rule that works most of the time: if the target is a number, start with MSE; if the target is a class distribution, start with cross-entropy.\nMean Squared Error (MSE) The formula is:\nMSE = (1/n) * sum((yi - y_hat_i)^2)\nFor each example:\nsubtract prediction from target square the error average across examples Why square the error?\nit makes all errors positive it penalizes larger mistakes more heavily it is smooth and easy to differentiate MSE Intuition Suppose the true values are:\n[3, 5, 2]\nand the model predicts:\n[4, 2, 3]\nThen the errors are:\n[1, -3, 1]\nSquared errors:\n[1, 9, 1]\nAverage:\nMSE = (1 + 9 + 1) / 3 = 11/3\nNotice what happened:\nthe error of -3 became 9 large mistakes dominate the loss That is often desirable in regression.\nMSE does not treat every miss equally. Once errors are squared, larger misses dominate the average.\nWhere MSE Fits Best MSE is a natural choice when:\noutputs are real-valued numbers distance between prediction and truth matters directly large deviations should be punished strongly Typical applications:\nforecasting regression benchmarks continuous control targets reconstruction losses in some autoencoder setups Cross-Entropy Loss Cross-entropy is used when the model outputs probabilities over classes.\nFor binary classification, the loss is:\nL = -[y * log(p) + (1 - y) * log(1 - p)]\nWhere:\ny is the true label (0 or 1) p is the predicted probability of class 1 For multi-class classification, the idea generalizes:\nthe model outputs a probability distribution cross-entropy measures how far that predicted distribution is from the true one In practice, for a one-hot target, the loss mostly reduces to:\nnegative log probability assigned to the correct class\nCross-Entropy Intuition Suppose the true class is cat.\nThe model predicts:\ncat: 0.70 dog: 0.20 bird: 0.10 That is decent. The model assigned fairly high probability to the correct class.\nNow compare with:\ncat: 0.05 dog: 0.90 bird: 0.05 This is much worse, and cross-entropy punishes it heavily because the model was not just wrong. It was confidently wrong.\nThat is one of cross-entropy\u0026rsquo;s key properties.\nCross-entropy mainly cares about the probability on the true class. If that probability is tiny, the loss gets large fast.\nWhy MSE Feels Natural for Regression If your model predicts a house price of $410,000 and the truth is $400,000, then the most natural question is:\nHow far off was the number?\nMSE measures exactly that kind of numeric error.\nThere is no notion of \u0026ldquo;class probability\u0026rdquo; here. You do not care whether the model was \u0026ldquo;70% house-price-ish.\u0026rdquo; You care about distance in value.\nWhy Cross-Entropy Feels Natural for Classification If your model predicts whether an email is spam, the key question is not:\nHow numerically far was my output from 1?\nThe key question is:\nHow much probability did the model assign to the correct class?\nCross-entropy is built for exactly that.\nIt aligns the loss with the probabilistic nature of classification.\nWhy Cross-Entropy Usually Beats MSE for Classification You technically can use MSE for classification in some settings, but it is usually the wrong tool.\nReasons:\n1. Cross-entropy gives better gradients When classification outputs come from sigmoid or softmax, cross-entropy tends to produce stronger and more useful gradient signals than MSE.\n2. Cross-entropy matches the probabilistic objective Classification is fundamentally about assigning high probability to the correct class. Cross-entropy measures that directly.\n3. MSE can slow learning in classification If the model output saturates, MSE can lead to weaker gradients and slower optimization.\nThis is one reason logistic regression and neural classifiers are normally trained with cross-entropy, not MSE.\nBinary Classification: Sigmoid + Cross-Entropy In binary classification, a common setup is:\nfinal linear layer sigmoid activation binary cross-entropy loss The sigmoid converts the output into a probability between 0 and 1.\nThen binary cross-entropy asks:\nHow much probability did the model assign to the correct answer?\nThis pairing is standard because the modeling assumptions and the loss fit each other.\nMulti-Class Classification: Softmax + Cross-Entropy For multiple mutually exclusive classes, the common setup is:\noutput layer with one logit per class softmax to convert logits into probabilities cross-entropy loss Example:\nClasses: cat, dog, bird Logits -\u0026gt; softmax -\u0026gt; probabilities Then cross-entropy penalizes the model based on how much probability it assigned to the true class.\nThis is also exactly what happens in next-token prediction for language models, except the \u0026ldquo;classes\u0026rdquo; are vocabulary tokens.\nMSE vs Cross-Entropy by Example Suppose true label = 1.\nTwo models output:\nModel A: 0.9 Model B: 0.6 With MSE:\nModel A loss = (1 - 0.9)^2 = 0.01 Model B loss = (1 - 0.6)^2 = 0.16 So Model B is worse, as expected.\nBut now imagine a badly wrong prediction:\nModel C: 0.01 MSE gives:\n(1 - 0.01)^2 = 0.9801\nCross-entropy gives:\n-log(0.01) ≈ 4.605\nCross-entropy punishes confident wrongness much more aggressively, which is exactly what we usually want in classification.\nRegression with MSE: Why Squaring Helps and Hurts MSE has a useful property:\nbigger mistakes matter much more than smaller ones That helps when large errors are genuinely much worse.\nBut it also means MSE is sensitive to outliers. If your dataset has a few extreme targets, they can dominate training.\nIn such settings, people sometimes prefer alternatives like MAE or Huber loss. Still, MSE remains the default baseline for regression.\nThe Probabilistic View Another way to think about this:\nminimizing MSE corresponds to assuming Gaussian-like error behavior minimizing cross-entropy corresponds to maximizing likelihood for classification probabilities You do not need to work from those probabilistic derivations every day, but they explain why these losses are not arbitrary formulas. They correspond to different output assumptions.\nA Tiny PyTorch Example MSE for regression import torch import torch.nn.functional as F pred = torch.tensor([2.5, 0.0, 2.1]) target = torch.tensor([3.0, -0.5, 2.0]) loss = F.mse_loss(pred, target) print(loss.item()) Cross-entropy for classification import torch import torch.nn.functional as F logits = torch.tensor([[2.0, 0.5, -1.0]]) target = torch.tensor([0]) # correct class index loss = F.cross_entropy(logits, target) print(loss.item()) Notice that cross_entropy expects raw logits, not already-softmaxed probabilities. The function handles the stable softmax-like computation internally.\nCommon Mistakes Using MSE for a standard classification model This usually works worse than cross-entropy and gives weaker training dynamics.\nApplying softmax before cross_entropy in PyTorch Usually do not do that. torch.nn.functional.cross_entropy expects logits.\nTreating cross-entropy like \u0026ldquo;just another error distance\u0026rdquo; It is not primarily measuring numeric distance. It is measuring mismatch between predicted probabilities and the target distribution.\nForgetting the task-output-loss alignment Your final layer, activation choice, and loss function need to fit together.\nQuick Decision Rule Use:\nMSE for predicting numbers cross-entropy for predicting classes That rule alone will get you through most practical cases.\nSummary MSE measures squared numeric error and is the standard loss for regression. Cross-entropy measures how much probability the model assigns to the correct class and is the standard loss for classification. MSE is intuitive for continuous targets. Cross-entropy is better aligned with probabilistic classification and usually produces better gradients there. In modern deep learning, picking the right loss is not a detail. It defines what \u0026ldquo;good prediction\u0026rdquo; even means. If the model output is a number, think MSE first. If the model output is a class probability distribution, think cross-entropy first.\n","permalink":"https://learncodecamp.net/mse-vs-cross-entropy-explained/","summary":"\u003cp\u003eLoss functions answer one basic question:\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003eHow wrong is the model right now?\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eWithout a loss function, a neural network has no way to measure its own mistakes, and without that measurement, gradient-based training has nothing to optimize.\u003c/p\u003e\n\u003cp\u003eTwo of the most important losses in machine learning are:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eMean Squared Error (MSE)\u003c/strong\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCross-Entropy\u003c/strong\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThey are both common. They are both differentiable. But they solve different kinds of problems, and using the wrong one makes training harder than it needs to be.\u003c/p\u003e","title":"MSE vs Cross-Entropy: Which Loss Function Should You Use?"},{"content":"A multi-layer perceptron (MLP) is one of the simplest and most important neural network architectures. It is not flashy. It is not state of the art for language or vision by itself. But if you do not understand MLPs, a lot of modern deep learning stays blurry.\nMLPs teach the core structure of neural networks:\ninputs become vectors layers apply learned linear transforms activations add nonlinearity deeper layers build more useful internal representations They also still matter in practice. Even transformers contain MLP blocks. Recommendation systems, tabular models, and many small classifiers still use dense networks directly.\nWhat Is a Multi-Layer Perceptron? A multi-layer perceptron is a feedforward neural network built from stacked dense layers.\nThe word choices matter:\nmulti-layer means more than one learnable layer perceptron comes from the historical single-neuron model feedforward means information moves from input to output without cycles dense or fully connected means every neuron in one layer connects to every neuron in the next A typical MLP looks like this:\nInput layer -\u0026gt; Hidden layer 1 -\u0026gt; Hidden layer 2 -\u0026gt; ... -\u0026gt; Output layer Why a Single Neuron Is Not Enough A single neuron can only compute a simple weighted combination of inputs followed by an activation.\nEven if you use an activation, one neuron has very limited capacity. It cannot express rich hierarchical structure. More importantly, a single linear layer without nonlinear activation can only represent linear decision boundaries.\nThat is why we stack neurons into layers and layers into networks.\nThe Three Kinds of Layers 1. Input Layer This is where the raw features enter the model.\nExamples:\nfor house-price prediction: square footage, number of rooms, age for tabular fraud detection: transaction amount, location, merchant category for toy text experiments: bag-of-words or embedding vectors The input layer usually does not perform computation by itself. It just provides the starting vector.\n2. Hidden Layers These are the computational core of the MLP.\nEach hidden layer applies:\na = f(Wx + b)\nThe output of one hidden layer becomes the input to the next.\nWhy \u0026ldquo;hidden\u0026rdquo;? Because we do not directly observe or supervise these intermediate values. The network learns them internally.\n3. Output Layer The final layer maps the learned internal representation to the task output.\nTypical output choices:\nlinear output for regression sigmoid for binary classification softmax for multi-class classification The output layer is task-dependent in a way that hidden layers usually are not.\nWhat Fully Connected Actually Means Suppose one layer has 3 inputs and the next layer has 4 neurons.\nIn a fully connected layer:\nneuron 1 sees all 3 inputs neuron 2 also sees all 3 inputs neuron 3 also sees all 3 inputs neuron 4 also sees all 3 inputs That means the weight matrix has shape:\n4 x 3\nEach row belongs to one neuron.\nThis dense connectivity makes MLPs flexible, but it also makes them parameter-heavy when input dimensions become large.\nA Small MLP Example Imagine a network with:\ninput size = 3 hidden layer 1 = 4 neurons hidden layer 2 = 4 neurons output size = 2 The forward pass is:\nx shape (3,) -\u0026gt; z1 = W1x + b1 shape (4,) -\u0026gt; a1 = ReLU(z1) -\u0026gt; z2 = W2a1 + b2 shape (4,) -\u0026gt; a2 = ReLU(z2) -\u0026gt; z3 = W3a2 + b3 shape (2,) -\u0026gt; y_hat This is already a meaningful neural network.\nWhy Hidden Layers Matter The real power of an MLP comes from composition.\nEarlier layers can learn simpler patterns. Later layers can combine those into more abstract ones.\nFor example, in a toy credit-risk model:\nthe first layer might respond to income level, debt ratio, and past defaults the next layer might combine those into broader \u0026ldquo;financial stability\u0026rdquo; signals the output layer turns that representation into a probability of default This is the general representation-learning story behind deep learning.\nWidth vs Depth There are two obvious ways to make an MLP larger:\nadd more neurons per layer (width) add more layers (depth) Wide Networks A wider network has more neurons in each layer.\nPros:\nmore capacity per layer sometimes easier to optimize in small settings Cons:\nmore parameters quickly may be inefficient compared to deeper structures Deep Networks A deeper network has more stacked layers.\nPros:\ncan learn hierarchical features often more parameter-efficient than making one shallow layer huge Cons:\nharder to train more sensitive to gradient flow and initialization issues Modern deep learning tends to prefer depth, but only when the architecture and optimization setup support it well.\nThe Universal Approximation Idea You will often hear that a feedforward network with one hidden layer can approximate any continuous function, given enough neurons.\nThat statement is mathematically important, but it is often misunderstood.\nWhat it does mean:\nshallow networks can be extremely expressive in theory What it does not mean:\nthey are easy to train they are efficient they are the best design in practice A huge shallow network may be far less practical than a well-designed deeper one.\nWhy MLPs Struggle with Images and Sequences MLPs connect everything to everything. That is fine for low-dimensional tabular data. It becomes awkward for structured data.\nFor images A 100 x 100 image has 10,000 pixels. Flattening that into one vector and connecting it densely to even one hidden layer produces a massive number of parameters.\nThis is why convolutional networks were such a breakthrough for vision.\nFor sequences Text and time series have order and local structure. Plain MLPs do not model that structure naturally. That is why RNNs, CNNs, and especially transformers became more suitable for those domains.\nSo MLPs are foundational, but not universal best-practice architectures.\nWhy MLPs Still Matter in Modern Models It would be a mistake to think MLPs are obsolete.\nThey still appear everywhere:\ntabular classification and regression simple baseline models smaller production systems recommendation and ranking stacks the feed-forward blocks inside transformers In a transformer block, after attention mixes information across tokens, an MLP applies a per-token nonlinear transformation. So even GPT-style models still rely on dense-network ideas constantly.\nParameter Count in a Dense Layer One dense layer with:\ninput size m output size n has:\nm * n weights n biases So total parameters:\nm * n + n\nExample:\ninput size = 512 hidden size = 2048 Then one layer has:\n512 * 2048 + 2048 = 1,050,624 parameters\nThis is why dense layers become expensive fast.\nA Minimal PyTorch MLP import torch import torch.nn as nn model = nn.Sequential( nn.Linear(3, 4), nn.ReLU(), nn.Linear(4, 4), nn.ReLU(), nn.Linear(4, 2) ) x = torch.tensor([[0.2, -1.1, 3.0]]) logits = model(x) print(logits) This is a small MLP:\ninput dimension = 3 two hidden dense layers output dimension = 2 The code is short because frameworks hide the tensor plumbing, but the architecture is still the same stack of dense layers and activations.\nDesign Choices That Matter When building an MLP, the main decisions are:\nNumber of layers Too few and the network may underfit. Too many and optimization becomes harder.\nHidden size Larger hidden layers increase capacity but also parameter count and compute cost.\nActivation function ReLU is a common default. GELU and related functions are also popular in modern models.\nOutput activation This must match the task:\nregression: often no activation binary classification: sigmoid multi-class classification: softmax Regularization Dropout, weight decay, and early stopping can help prevent overfitting.\nWhen an MLP Is a Good Choice MLPs are still a strong option when:\nthe data is tabular feature count is moderate local spatial structure is not the main signal you need a strong baseline quickly They are often the right \u0026ldquo;boring\u0026rdquo; choice before reaching for something more complex.\nWhen an MLP Is the Wrong Choice An MLP is usually not the best first choice when:\nthe input is a large image the input is long text or code the task depends heavily on locality, order, or long-range interactions parameter efficiency is critical In those cases, CNNs, transformers, or hybrid architectures are often better matched to the data.\nSummary A multi-layer perceptron is a feedforward neural network made of stacked dense layers. Fully connected means every neuron in one layer connects to every neuron in the next. Hidden layers let the model build increasingly useful internal representations. Width increases neurons per layer; depth increases number of layers. MLPs are foundational, still widely useful, and still embedded inside modern architectures. They are strongest on lower-dimensional or tabular problems and weaker on highly structured inputs like images and long sequences. If you understand MLPs well, you understand the basic grammar of deep learning. Most other architectures are not replacements for that grammar. They are specialized extensions of it.\n","permalink":"https://learncodecamp.net/multi-layer-perceptron-dense-networks-explained/","summary":"\u003cp\u003eA \u003cstrong\u003emulti-layer perceptron (MLP)\u003c/strong\u003e is one of the simplest and most important neural network architectures. It is not flashy. It is not state of the art for language or vision by itself. But if you do not understand MLPs, a lot of modern deep learning stays blurry.\u003c/p\u003e\n\u003cp\u003eMLPs teach the core structure of neural networks:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003einputs become vectors\u003c/li\u003e\n\u003cli\u003elayers apply learned linear transforms\u003c/li\u003e\n\u003cli\u003eactivations add nonlinearity\u003c/li\u003e\n\u003cli\u003edeeper layers build more useful internal representations\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThey also still matter in practice. Even transformers contain MLP blocks. Recommendation systems, tabular models, and many small classifiers still use dense networks directly.\u003c/p\u003e","title":"Multi-Layer Perceptron Explained: Dense Networks from First Principles"},{"content":"The forward pass is the part of a neural network that actually produces a prediction. You feed inputs into the model, the model applies a sequence of mathematical operations, and an output comes out the other side.\nThat sounds trivial, but it is one of the most important ideas in deep learning because everything else depends on it:\nthe loss compares the forward-pass output to the target backpropagation differentiates through the forward pass training is just repeating the forward pass and improving it The easiest way to understand the forward pass is to start with a single neuron and then scale it up into a full layer written in matrix form.\nThis follows the same teaching sequence used in the chapter 1 PDF: inputs feed into a weighted sum, the bias shifts the pre-activation, and the activation turns that value into the final output.\nWhat the Forward Pass Means A forward pass is simply:\ninput -\u0026gt; transformations -\u0026gt; output\nFor a neural network, those transformations are usually:\nweighted sums bias additions activation functions repeated layer by layer You can think of the forward pass as the model saying:\nGiven these current weights, this is my current answer.\nStep 1: A Single Artificial Neuron One neuron takes several inputs, multiplies each by a weight, adds them together, adds a bias, and then usually applies a nonlinear activation function.\nThe core equation is:\nz = w1x1 + w2x2 + ... + wnxn + b\nThen:\na = f(z)\nWhere:\nx1, x2, ..., xn are inputs w1, w2, ..., wn are weights b is the bias z is the pre-activation f is the activation function a is the final output of the neuron This is the smallest useful forward pass.\nThe PDF’s key move is to show that the left-to-right flow stays the same even when we stop thinking about one neuron and start thinking in vectors and matrices.\nA Worked Numerical Example Suppose:\nx1 = 1.5 x2 = -2.0 x3 = 0.8 w1 = 0.4 w2 = -0.5 w3 = 0.3 b = 0.1 First compute the weighted sum:\nz = (1.5 * 0.4) + (-2.0 * -0.5) + (0.8 * 0.3) + 0.1\nz = 0.6 + 1.0 + 0.24 + 0.1 = 1.94\nNow apply ReLU:\na = max(0, 1.94) = 1.94\nSo the forward pass output of this neuron is 1.94.\nWhy We Need the Bias Without the bias term, the neuron would always produce 0 whenever all inputs are 0.\nThe bias gives the neuron a learnable offset. It lets the model shift thresholds and decision boundaries instead of forcing every computation to pass through the origin.\nIn practice, the bias is small in notation and extremely important in behavior.\nPre-Activation vs Post-Activation This distinction matters:\nz is the pre-activation a = f(z) is the post-activation Why keep them separate?\nBecause:\nthe weighted sum carries the linear combination the activation function introduces nonlinearity Without that nonlinearity, stacking many layers would still collapse into one big linear transformation.\nWhy Activations Matter in the Forward Pass If every layer were just:\nz = Wx + b\nthen even many stacked layers would still be equivalent to a single linear map.\nActivation functions break that limitation.\nCommon examples:\nSigmoid maps values into (0, 1) Tanh maps values into (-1, 1) ReLU maps negative values to 0 and leaves positive values unchanged GELU smoothly gates inputs and is widely used in transformers For many modern neural networks, the forward pass is really:\nlinear transform -\u0026gt; activation -\u0026gt; linear transform -\u0026gt; activation -\u0026gt; ...\nStep 2: From One Neuron to One Layer A layer is just many neurons operating on the same input.\nSuppose the input has 3 features:\nx = [x1, x2, x3]\nAnd the layer has 4 neurons. Each neuron has its own weights and bias.\nThat means:\nneuron 1 computes its own weighted sum neuron 2 computes a different weighted sum neuron 3 computes another one neuron 4 does the same The outputs are stacked into a vector.\nSo instead of one scalar output, the layer produces:\nz = [z1, z2, z3, z4]\nThen:\na = f(z)\nwhere the activation function is applied elementwise.\nThe Matrix Form Writing each neuron separately gets tedious fast. Matrix notation is cleaner and more powerful.\nFor a dense layer:\nz = Wx + b\nWhere:\nx has shape (m,) W has shape (n, m) b has shape (n,) z has shape (n,) If the input has m features and the layer has n neurons, then the layer contains:\none row of weights per neuron one bias per neuron Then the activation gives:\na = f(z)\nSo the full forward pass of the layer is:\na = f(Wx + b)\nThis one expression captures an entire layer of neurons.\nA Concrete Matrix Example Let:\nx = [2, -1] W = [[0.5, 0.2], [-0.3, 0.8], [1.0, -0.5]] b = [0.1, -0.2, 0.0] Then:\nWx = [ (0.5 * 2) + (0.2 * -1), (-0.3 * 2) + (0.8 * -1), (1.0 * 2) + (-0.5 * -1) ] = [0.8, -1.4, 2.5] Add the bias:\nz = [0.9, -1.6, 2.5]\nApply ReLU:\na = [0.9, 0, 2.5]\nSo this 3-neuron layer converts a 2-dimensional input into a 3-dimensional output.\nWhy Matrix Form Matters The matrix form is not just mathematically elegant. It is how real neural networks are implemented efficiently.\nReasons it matters:\nGPUs are extremely good at matrix multiplication whole layers can be computed in parallel frameworks like PyTorch and TensorFlow are built around tensor operations the same formula scales cleanly from tiny demos to giant models This is one of the key transitions from \u0026ldquo;I understand the idea\u0026rdquo; to \u0026ldquo;I understand how this is actually implemented.\u0026rdquo;\nAdding a Batch Dimension In real training, we usually process many examples at once.\nInstead of one input vector, we have a batch:\nX with shape (batch_size, input_dim)\nThen the layer computation becomes:\nZ = XW^T + b\nor equivalently, depending on notation conventions:\nZ = WX + b\nThe exact formula changes based on whether examples are row vectors or column vectors, but the idea does not:\neach example is run through the same weights the outputs are computed together in one batched matrix operation For example, if:\nbatch size = 32 input dimension = 128 hidden size = 256 then:\nX might be (32, 128) W might be (256, 128) output Z becomes (32, 256) That is one forward pass through one dense layer for 32 examples at once.\nStacking Layers Once one layer produces an output, the next layer treats that output as its input.\nFor a 2-layer network:\na1 = f(W1x + b1)\na2 = g(W2a1 + b2)\nThis is how representation learning happens:\nthe first layer extracts simple patterns later layers transform those patterns into more useful abstractions Even very large models are still built from this repeating idea.\nForward Pass in a Tiny Neural Network Imagine:\ninput dimension = 3 hidden layer = 4 neurons output layer = 2 neurons Then the forward pass looks like:\nx (3,) -\u0026gt; z1 = W1x + b1 # (4,) -\u0026gt; a1 = ReLU(z1) # (4,) -\u0026gt; z2 = W2a1 + b2 # (2,) -\u0026gt; y_hat = softmax(z2) If this is a classifier, the final output might be:\ny_hat = [0.91, 0.09]\nmeaning the model currently believes class 1 is much more likely than class 2.\nA Minimal PyTorch Example import torch import torch.nn.functional as F x = torch.tensor([[1.5, -2.0, 0.8]]) # batch of 1 W1 = torch.tensor([ [0.4, -0.5, 0.3], [0.1, 0.2, -0.4] ], dtype=torch.float32) b1 = torch.tensor([0.1, -0.2], dtype=torch.float32) z1 = x @ W1.T + b1 a1 = F.relu(z1) print(\u0026#34;z1:\u0026#34;, z1) print(\u0026#34;a1:\u0026#34;, a1) This performs exactly the same kind of forward pass we computed by hand, just in batched tensor form.\nCommon Mistakes When Learning This Mixing up scalar and matrix notation The neuron equation and the matrix equation describe the same idea at different scales.\nForgetting the activation function The weighted sum alone is not enough for a deep network to be expressive.\nIgnoring tensor shapes A lot of confusion in neural network code is really shape confusion. It is worth checking shapes constantly.\nThinking the forward pass is only for inference Training also depends on the forward pass. You cannot compute loss or gradients without first producing predictions.\nSummary The forward pass is the process of turning inputs into outputs using the model\u0026rsquo;s current parameters. A single neuron computes a weighted sum plus bias, then applies an activation. A dense layer is many neurons applied in parallel. Matrix form compresses the layer computation into a = f(Wx + b). Batched matrix operations make neural networks fast on GPUs. Every large neural network is still built from repeated forward-pass blocks like these. Once matrix form feels natural, a lot of neural-network architecture becomes much easier to read and implement.\n","permalink":"https://learncodecamp.net/forward-pass-from-single-neuron-to-matrix-form/","summary":"\u003cp\u003eThe forward pass is the part of a neural network that actually produces a prediction. You feed inputs into the model, the model applies a sequence of mathematical operations, and an output comes out the other side.\u003c/p\u003e\n\u003cp\u003eThat sounds trivial, but it is one of the most important ideas in deep learning because everything else depends on it:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ethe \u003cstrong\u003eloss\u003c/strong\u003e compares the forward-pass output to the target\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003ebackpropagation\u003c/strong\u003e differentiates through the forward pass\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003etraining\u003c/strong\u003e is just repeating the forward pass and improving it\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe easiest way to understand the forward pass is to start with a single neuron and then scale it up into a full layer written in matrix form.\u003c/p\u003e","title":"Forward Pass Explained: From a Single Neuron to Matrix Form"},{"content":"Backpropagation is the core algorithm that makes neural networks trainable. The forward pass tells the model what prediction it currently makes. Backpropagation tells the model how each weight contributed to the error so the optimizer can update those weights in the right direction.\nPeople often hear that backpropagation is \u0026ldquo;just the chain rule,\u0026rdquo; which is true but not especially helpful. The useful mental model is this:\nthe forward pass computes values the backward pass computes sensitivities each node only needs its own local derivative the full gradient is built by multiplying those local derivatives along the path If that sounds abstract, it becomes much clearer once you look at one neuron first and then scale up.\nThe chapter 1 PDF frames backpropagation in exactly the right order: start with the chain rule, then show how that rule becomes a graph algorithm.\nThe High-Level Idea Training a neural network repeats the same loop:\nRun a forward pass and compute a prediction. Compare that prediction to the true answer using a loss function. Run a backward pass to compute gradients. Update weights using gradient descent or one of its variants. Backpropagation is step 3.\nWhen we say \u0026ldquo;gradient,\u0026rdquo; we mean:\nHow much would the loss change if I nudged this parameter a little?\nIf increasing a weight increases the loss, we want to push that weight down. If increasing a weight decreases the loss, we want to push it up.\nA Single Neuron: The Smallest Useful Example Consider one neuron:\nz = wx + b\na = ReLU(z)\nL = loss(a, y)\nWhere:\nx is the input w is the weight b is the bias z is the pre-activation a is the output after activation L is the loss The forward pass flows left to right:\nx, w, b -\u0026gt; z = wx + b -\u0026gt; a = ReLU(z) -\u0026gt; L The backward pass flows right to left:\nL -\u0026gt; dL/da -\u0026gt; dL/dz -\u0026gt; dL/dw, dL/db, dL/dx This reverse flow is why it is called backpropagation.\nWhy the Chain Rule Is the Whole Story Suppose L depends on a, a depends on z, and z depends on w.\nThen:\ndL/dw = (dL/da) * (da/dz) * (dz/dw)\nThat is the chain rule.\nThis matters because large neural networks are just many small operations glued together:\nmultiply add activation matrix multiply normalization Each operation knows how to compute its own local derivative. Backpropagation stitches those pieces together.\nA single neuron already contains the full backprop pattern: forward values move left to right, gradients move right to left, and the multiply node routes gradients differently to each input.\nA Concrete Numerical Example Take:\nx = 2 w = 3 b = 1 Forward pass:\nz = wx + b = 2*3 + 1 = 7\na = ReLU(7) = 7\nSuppose the target is y = 4, and we use squared error:\nL = (a - y)^2 = (7 - 4)^2 = 9\nNow let us go backward.\nStep 1: Derivative of the loss with respect to the output L = (a - y)^2\nSo:\ndL/da = 2(a - y) = 2(7 - 4) = 6\nThis means: if a increases a little, the loss increases about 6 times that amount.\nStep 2: Derivative through ReLU a = ReLU(z)\nFor ReLU:\nderivative is 1 if z \u0026gt; 0 derivative is 0 if z \u0026lt; 0 Here z = 7, so:\nda/dz = 1\nThen:\ndL/dz = (dL/da) * (da/dz) = 6 * 1 = 6\nStep 3: Derivative with respect to the weight z = wx + b\nSo:\ndz/dw = x = 2\nThen:\ndL/dw = (dL/dz) * (dz/dw) = 6 * 2 = 12\nStep 4: Derivative with respect to the bias dz/db = 1\nSo:\ndL/db = (dL/dz) * (dz/db) = 6\nStep 5: Derivative with respect to the input dz/dx = w = 3\nSo:\ndL/dx = (dL/dz) * (dz/dx) = 6 * 3 = 18\nThis last one is not used to update the data input, but it becomes important when one layer\u0026rsquo;s output is another layer\u0026rsquo;s input.\nWhy the Multiply Node \u0026ldquo;Swaps\u0026rdquo; Gradients For a multiplication m = xw:\ndm/dx = w dm/dw = x That is why during backpropagation the multiply node seems to \u0026ldquo;swap\u0026rdquo; the values:\ngradient sent to x is scaled by w gradient sent to w is scaled by x This pattern appears constantly in neural networks.\nComputation Graphs Make Backpropagation Much Easier to Understand A neural network can be viewed as a computation graph. Every node is an operation, and every edge carries a value.\nFor our neuron:\nx ----\\ * ----\\ w ----/ \\ + ---- z ---- ReLU ---- a ---- loss ---- L b -------------/ In the forward pass, you compute values at each node.\nIn the backward pass, you compute:\nthe upstream gradient coming into a node the local derivative of that node the downstream gradients sent to its inputs That is the entire algorithm.\nGeneral Backpropagation Pattern Every node follows the same recipe:\nReceive gradient from the node above. Multiply by the node\u0026rsquo;s local derivative. Pass the result to the node\u0026rsquo;s inputs. For example:\nupstream gradient * local derivative = downstream gradient For an addition node:\nz = u + v\nThe derivatives are:\ndz/du = 1 dz/dv = 1 So the upstream gradient gets copied to both inputs.\nFor a multiplication node:\nz = uv\nThe derivatives are:\ndz/du = v dz/dv = u So the upstream gradient gets scaled differently for each input.\nExtending to a Layer A layer is just many neurons processed together.\nInstead of:\nz = wx + b\nwe write:\nz = Wx + b\nWhere:\nx is a vector of inputs W is a weight matrix b is a bias vector z is the vector of pre-activations Backpropagation still follows the same logic, just with vectors and matrices instead of single numbers.\nFor a dense layer:\nthe gradient with respect to W depends on the input activations the gradient with respect to b is the accumulated gradient over the batch the gradient with respect to x is what gets passed to the previous layer This is why modern frameworks can train huge networks efficiently. The same chain-rule logic becomes large, optimized matrix math.\nBackpropagation Through Multiple Layers Imagine a 2-layer network:\nx -\u0026gt; z1 = W1x + b1 -\u0026gt; a1 = ReLU(z1) -\u0026gt; z2 = W2a1 + b2 -\u0026gt; y_hat -\u0026gt; L Backpropagation works from the end toward the start:\nCompute dL/dy_hat Push it through the output layer to get gradients for W2 and b2 Compute the gradient with respect to a1 Push that through ReLU to get dL/dz1 Use that to compute gradients for W1 and b1 Earlier layers do not directly see the loss. They receive a gradient signal passed backward through later layers.\nOnce the graph spans multiple layers, the logic does not change. The loss gradient simply gets relayed backward, one local derivative at a time, until it reaches earlier weights.\nWhy Activations Matter for Gradient Flow Activation functions affect not just the forward pass but also the backward pass.\nSigmoid and Tanh Sigmoid and tanh can saturate. In their flat regions, the derivative becomes very small. When many such small derivatives are multiplied together across layers, gradients can shrink toward zero.\nThat is the vanishing gradient problem.\nReLU ReLU avoids this problem for positive activations because its derivative is 1 there. That is one reason ReLU made deep networks much easier to train.\nBut ReLU has its own issue:\nif a neuron stays on the negative side, its gradient can become 0 then that neuron may stop learning This is often called a dead ReLU.\nWhy Backpropagation Is Efficient A naive approach would compute the effect of each weight separately from scratch. That would be absurdly expensive.\nBackpropagation is efficient because it reuses intermediate results.\nThe forward pass stores values like:\ninputs pre-activations activations The backward pass reuses them to compute gradients locally.\nThis dynamic-programming flavor is what makes deep learning practical.\nHow the Optimizer Uses These Gradients Backpropagation itself does not update the weights. It only computes gradients.\nThen an optimizer such as SGD or Adam applies an update like:\nw \u0026lt;- w - eta * dL/dw\nWhere eta is the learning rate.\nSo the separation is:\nbackpropagation computes gradient information optimization uses that information to move parameters A Minimal PyTorch Example import torch import torch.nn as nn x = torch.tensor([[2.0]]) y = torch.tensor([[4.0]]) linear = nn.Linear(1, 1) with torch.no_grad(): linear.weight[:] = torch.tensor([[3.0]]) linear.bias[:] = torch.tensor([1.0]) pred = torch.relu(linear(x)) loss = (pred - y).pow(2).mean() loss.backward() print(\u0026#34;prediction:\u0026#34;, pred.item()) print(\u0026#34;loss:\u0026#34;, loss.item()) print(\u0026#34;dL/dw:\u0026#34;, linear.weight.grad.item()) print(\u0026#34;dL/db:\u0026#34;, linear.bias.grad.item()) Autograd performs the same chain-rule computation we walked through manually.\nCommon Misunderstandings \u0026ldquo;Backpropagation and gradient descent are the same thing\u0026rdquo; They are related but different.\nbackpropagation computes gradients gradient descent uses gradients to update weights \u0026ldquo;The model learns by changing all weights equally\u0026rdquo; No. Each parameter gets its own gradient, so each parameter gets its own update.\n\u0026ldquo;Backpropagation is specific to neural networks\u0026rdquo; Not really. It is a general reverse-mode automatic differentiation technique applied to computation graphs. Neural networks just happen to be the most famous use case.\nIntuition to Keep If you remember only one thing, make it this:\nBackpropagation answers the question:\nWhich earlier choices caused the final error, and by how much?\nThat answer is computed by moving backward through the graph, one local derivative at a time.\nSummary The forward pass computes predictions. The backward pass computes gradients of the loss with respect to parameters. Backpropagation is just the chain rule applied efficiently over a computation graph. Each node uses local derivatives and the upstream gradient. The resulting gradients tell the optimizer how to adjust weights. Deep learning works because this process scales from a single neuron to networks with billions of parameters. Once this clicks, a lot of deep learning stops feeling magical. It becomes a structured pipeline of matrix operations, local derivatives, and careful bookkeeping.\n","permalink":"https://learncodecamp.net/backpropagation-explained-visually/","summary":"\u003cp\u003eBackpropagation is the core algorithm that makes neural networks trainable. The forward pass tells the model what prediction it currently makes. Backpropagation tells the model \u003cstrong\u003ehow each weight contributed to the error\u003c/strong\u003e so the optimizer can update those weights in the right direction.\u003c/p\u003e\n\u003cp\u003ePeople often hear that backpropagation is \u0026ldquo;just the chain rule,\u0026rdquo; which is true but not especially helpful. The useful mental model is this:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ethe forward pass computes values\u003c/li\u003e\n\u003cli\u003ethe backward pass computes sensitivities\u003c/li\u003e\n\u003cli\u003eeach node only needs its own local derivative\u003c/li\u003e\n\u003cli\u003ethe full gradient is built by multiplying those local derivatives along the path\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIf that sounds abstract, it becomes much clearer once you look at one neuron first and then scale up.\u003c/p\u003e","title":"Backpropagation Explained Visually: How Neural Networks Actually Learn"},{"content":"Most developers only think about source maps when DevTools magically shows the original TypeScript instead of unreadable bundled JavaScript.\nThat convenience hides an important fact:\nA source map is not just “debug metadata.” It is a translation table between generated code and original source code.\nAnd depending on how it is emitted, it can contain the original source itself.\nThat is why source maps sit at the intersection of:\ndebugging build tooling browser DevTools error reporting systems like Sentry security and accidental code exposure If you have ever wondered how a minified file can still produce readable stack traces, or how a published .map file can expose a package\u0026rsquo;s real TypeScript source, this is the mental model you want.\nThe Problem Source Maps Solve In development, code usually looks like this:\nmany small files readable variable names comments TypeScript or JSX line breaks that make stack traces useful In production, code often looks very different:\nbundled into fewer files transpiled from TypeScript to JavaScript minified variable names shortened whitespace removed For example, you might write this:\nexport function calculateTotal(price: number, taxRate: number) { const tax = price * taxRate; return price + tax; } But ship something closer to this:\nexport function c(t,r){return t+t*r} That is good for performance and shipping, but terrible for debugging.\nIf production throws an error at column 23 of app.min.js, that position is almost meaningless to a human.\nA source map fixes that by saying:\nPosition generated.js:1:23 corresponds to src/billing.ts:2:15.\nThat is the core idea.\nWhat a Source Map Actually Is A source map is usually a JSON file with a .map extension, such as:\napp.js.map vendor.min.js.map cli.js.map Its job is to map locations in generated code back to locations in original files.\nA typical source map contains fields like these:\n{ \u0026#34;version\u0026#34;: 3, \u0026#34;file\u0026#34;: \u0026#34;app.min.js\u0026#34;, \u0026#34;sources\u0026#34;: [\u0026#34;src/billing.ts\u0026#34;], \u0026#34;sourcesContent\u0026#34;: [ \u0026#34;export function calculateTotal(price: number, taxRate: number) {\\n const tax = price * taxRate;\\n return price + tax;\\n}\\n\u0026#34; ], \u0026#34;names\u0026#34;: [\u0026#34;calculateTotal\u0026#34;, \u0026#34;price\u0026#34;, \u0026#34;taxRate\u0026#34;, \u0026#34;tax\u0026#34;], \u0026#34;mappings\u0026#34;: \u0026#34;AAAA,SAASA,eAAeC,KAAaC,OAAkB,CACrD,MAAMC,MAAMF,QAAQC,OACpB,OAAOD,QAAQE,GACjB\u0026#34; } The important fields are:\nField Meaning version Source map spec version. In practice, this is usually 3. file The generated file this map belongs to. sources The original source files used to generate the output. sourcesContent Optional embedded contents of those original source files. names Identifiers referenced by the map, such as variable or function names. mappings The compressed mapping data that translates generated positions back to original positions. sourceRoot Optional prefix for resolving paths in sources. Two fields matter more than most people realize:\nsources sourcesContent If sourcesContent is present, the source map may already contain the full original source code inline.\nThat is the part that surprises people.\nHow the Browser Finds a Source Map Generated JavaScript often ends with a special comment:\n//# sourceMappingURL=app.min.js.map That comment tells tools where the map lives.\nWhen browser DevTools sees that comment, it can fetch the map and use it to:\ndisplay original files let you place breakpoints in original code rewrite stack traces step through TypeScript or JSX as if that were the runtime source This is also why accidentally shipping a .map file publicly can be enough to expose a lot more than the minified bundle suggests.\nThere is also an inline form where the map is embedded directly into the JavaScript as a data URL, but external .map files are more common in production builds.\nThe Core Mental Model A source map does not mean the runtime is executing your TypeScript.\nThe runtime still executes JavaScript.\nThe source map is only a lookup table that says:\nthis generated line and column came from that original file this generated segment corresponds to that original symbol So there are really two parallel worlds:\nGenerated code, which actually runs Original code, which humans want to debug The source map is the bridge between them.\nHow the Mapping Works Exactly Source maps map positions, not just files.\nThey usually map:\ngenerated line generated column source file index original line original column optional name index The mappings field stores these mappings in a compact encoded format.\nThe encoding uses three important ideas:\nSemicolons separate generated lines Commas separate segments on the same generated line Each segment is Base64 VLQ-encoded and usually stores relative offsets That sounds ugly because it is ugly. It was designed for machines, not humans.\nThe mappings Field in Plain English Imagine the generated code is one minified line:\nfunction c(t,r){return t+t*r} Now imagine the original source is:\nexport function calculateTotal(price: number, taxRate: number) { const tax = price * taxRate; return price + tax; } The source map might record facts like:\ngenerated column 0 maps to src/billing.ts, line 1, column 0 generated column 9 maps to the original function name generated column 11 maps to original parameter price generated column 13 maps to original parameter taxRate generated column 16 maps to original return It does not need to record every single character. It records enough segments for tools to reconstruct the relationship between the generated file and the original source positions.\nWhat a Segment Contains After decoding, a segment can have up to five fields:\n[generatedColumn, sourceIndex, originalLine, originalColumn, nameIndex] In practice:\ngeneratedColumn says where this segment starts in the generated line sourceIndex points into the sources array originalLine is the original line number originalColumn is the original column number nameIndex points into the names array if a symbol name is attached Not every segment has all five fields.\nThe shortest useful segment is often four fields, and the fifth is included when symbol name metadata is available.\nWhy the Values Are Relative Source maps compress aggressively.\nInstead of storing absolute values repeatedly, segments usually store deltas relative to the previous segment.\nThat means:\nif the next mapping stays in the same source file, the sourceIndex delta may be 0 if the next original line is just one line later, the stored delta may be 1 if the next generated column is nearby, that delta is small too Small numbers compress well with Base64 VLQ.\nThat is why the mappings string looks cryptic while still staying fairly small.\nWhat Base64 VLQ Is Doing You do not need to memorize the bit layout, but you should know the purpose.\nVLQ stands for Variable Length Quantity.\nIt is a way to encode integers compactly, especially when many of them are small.\nThe source map format then uses a Base64 character set to serialize those encoded integers into text.\nSo the mappings field is basically:\na sequence of relative numeric fields VLQ-encoded turned into Base64 characters grouped into segments and lines That is why a source map is both:\ncompact enough to ship rich enough for debuggers to recover file, line, column, and symbol information How DevTools Uses a Source Map When you open DevTools and click what looks like an original .ts or .tsx file, DevTools is usually doing something like this internally:\nload the generated JavaScript discover the sourceMappingURL fetch the .map file decode sources, names, and mappings build a translation index between generated and original positions show you the original source, often from sourcesContent Then when an exception happens at:\napp.min.js:1:48192 DevTools can translate it into something like:\nsrc/components/Checkout.tsx:87:14 That is the whole magic.\nSource Maps Are Also Used Outside the Browser Many developers associate source maps only with front-end apps, but they are also useful for:\nNode.js stack traces server-side bundling CLI tools error reporting services log processing pipelines For example, if a TypeScript CLI is compiled into a bundled cli.js, a source map can still help turn runtime stack traces back into the original .ts or .tsx files.\nThat is one reason source maps show up in npm packages too, not just browser bundles.\nWhy Source Maps Sometimes Leak Real Source Code This is the part that causes trouble.\nA lot of people assume the .map file only contains coordinates.\nThat is not always true.\nIf the map includes sourcesContent, it may contain:\noriginal TypeScript or JSX comments internal file paths feature flags unused code paths removed from the final bundle symbol names that were shortened away in minified output So if a private CLI or server-side package ships:\ndist/cli.js dist/cli.js.map and that map includes embedded source content, anyone with access to the published package can potentially reconstruct much more of the original codebase than the minified bundle alone reveals.\nThat is how teams end up saying:\n“We only published build artifacts.”\nwhen in reality they also published a blueprint back to the original source.\nWhy This Happens in Practice There are a few common causes:\n1. Tooling defaults A bundler or compiler may emit source maps automatically in production.\n2. Source maps added for error monitoring Teams want readable stack traces in Sentry or another telemetry platform, which is reasonable.\n3. Publishing workflows are too broad Instead of publishing only the intended artifacts, the package includes everything in dist/, including .map files.\n4. People confuse “minified” with “safe” Minification is not a security boundary. It only makes code less pleasant to read.\n5. sourcesContent is enabled This is the biggest difference between “a useful debugging map” and “a possible source leak.”\nAn Important Nuance: Not All Source Maps Are Equally Risky There are several different deployment patterns:\nPublic source maps These are accessible to users and browsers. Great for debugging public front-end apps, but they can expose more than expected.\nHidden source maps These are generated for tooling and error reporting, but not linked from the served JavaScript with sourceMappingURL.\nPrivate uploaded source maps These are uploaded directly to a service like Sentry and never shipped publicly with the app or package.\nIf you want the debugging benefits without broadly publishing your original source, the last two patterns are usually the safer choice.\nWhat Source Maps Can and Cannot Do Source maps can:\nmap generated code back to original files improve debugging recover symbol names when available let tools show original TypeScript, JSX, or multiple input files Source maps cannot:\nmake JavaScript secret protect proprietary front-end logic act as a real obfuscation barrier change what code is actually running That distinction matters.\nIf code must stay private, do not rely on bundling or minification to protect it. If it runs on the client, the client ultimately gets executable code.\nA Simple End-to-End Example Here is the full lifecycle in one flow.\nStep 1: You write original source // src/math.ts export function square(x: number) { return x * x; } Step 2: The bundler emits JavaScript function s(n){return n*n}export{s as square}; Step 3: The bundler emits math.js.map That map says, in effect:\ngenerated s came from original square generated parameter n came from original x generated return n*n came from return x * x Step 4: The generated file references the map //# sourceMappingURL=math.js.map Step 5: A debugger or error reporter uses the map When an error happens in the bundled file, the tool translates the generated location back into the original src/math.ts.\nThat is the entire system.\nPractical Recommendations If you are building libraries, web apps, or CLIs, the right approach is usually:\nDecide whether source maps should be public, hidden, or private-only. Audit whether sourcesContent is included. Check what actually gets published to npm or deployed to production. Verify your telemetry workflow separately from your public artifact workflow. Treat .map files as review-worthy artifacts, not harmless leftovers. For many teams, the safest setup is:\ngenerate source maps for error reporting upload them to your monitoring system avoid publishing them broadly if they expose source you did not intend to ship Final Takeaway Source maps are one of those pieces of tooling that feel invisible when they work and very visible when they go wrong.\nThey exist because production JavaScript is optimized for machines, while debugging is optimized for humans.\nA source map is the bridge between those two worlds.\nThat bridge is incredibly useful:\nit gives you readable stack traces it lets DevTools show original TypeScript it makes modern bundling practical But it can also expose far more context than people expect, especially when sourcesContent is included and .map files are published carelessly.\nSo the correct mental model is not:\n“A source map is just a tiny debug file.”\nIt is closer to this:\n“A source map is a compact, machine-readable reconstruction guide from generated code back to the original program.”\nOnce you see it that way, both the debugging value and the security risk make immediate sense.\n","permalink":"https://learncodecamp.net/source-maps-explained-how-they-work/","summary":"\u003cp\u003eMost developers only think about source maps when DevTools magically shows the original TypeScript instead of unreadable bundled JavaScript.\u003c/p\u003e\n\u003cp\u003eThat convenience hides an important fact:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA source map is not just “debug metadata.” It is a translation table between generated code and original source code.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAnd depending on how it is emitted, it can contain the original source itself.\u003c/p\u003e\n\u003cp\u003eThat is why source maps sit at the intersection of:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003edebugging\u003c/li\u003e\n\u003cli\u003ebuild tooling\u003c/li\u003e\n\u003cli\u003ebrowser DevTools\u003c/li\u003e\n\u003cli\u003eerror reporting systems like Sentry\u003c/li\u003e\n\u003cli\u003esecurity and accidental code exposure\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIf you have ever wondered how a minified file can still produce readable stack traces, or how a published \u003ccode\u003e.map\u003c/code\u003e file can expose a package\u0026rsquo;s real TypeScript source, this is the mental model you want.\u003c/p\u003e","title":"Source Maps Explained: How They Work and Why They Sometimes Leak Source Code"},{"content":"When people think about adblock extensions, they usually imagine something simple:\n“The extension sees an ad and hides it.”\nThat is only part of the story.\nTools like uBlock Origin are better understood as content blockers, not just ad blockers.\nThey do block ads, but they also block:\ntrackers popups malware domains anti-blocker scripts other unwanted page behavior Modern blockers such as uBlock Origin mostly work by applying rules to:\nnetwork requests page elements scripts and browser behavior So an adblock extension is not really “understanding ads” like a human would. Most of the time it is doing fast pattern matching:\nDoes this request URL match a blocked domain? Is this script coming from a known tracker? Does this page element match a CSS selector from a filter list? Is there an exception rule that should allow it? Once that mental model clicks, writing and debugging custom filters becomes much easier.\nWhat an Adblock Extension Actually Does Most adblock extensions work in three layers:\nNetwork filtering Cosmetic filtering Behavior fixes or scriptlets Each layer solves a different problem.\n1. Network Filtering This is the most important part.\nEvery page loads many resources:\nHTML JavaScript CSS images fonts XHR or fetch calls media files like audio and video The blocker checks those requests against rules. If a rule matches, the request can be blocked before the resource fully loads.\nThat is why network filtering is so powerful:\nit can stop ads before they render it can block trackers before they send data it can prevent heavy media or scripts from loading Typical filter syntax looks like this:\n||doubleclick.net^ ||example.com/ads/^ ||tracker.example^$script ||cdn.example.com^$image,domain=example.com The ideas are simple:\n||domain^ usually means “match requests to this domain” $script, $image, $media restrict the resource type domain=example.com restricts where the rule applies So the extension is not guessing. It is matching rules against URLs, types, and context.\n2. Cosmetic Filtering Sometimes a request is not blocked, but the element is still hidden from the page.\nThat is cosmetic filtering.\nFor example, an adblocker may inject CSS rules like:\nexample.com##.sponsored-banner example.com##[aria-label=\u0026#34;Advertisement\u0026#34;] These do not stop the network request. They just hide matching page elements after the page loads.\nThat matters because some things are easier to hide than to block. A site may load the page from the same domain as its ads, which makes network-level blocking risky. In those cases, cosmetic filters are often safer.\n3. Behavior Fixes and Scriptlets Some blockers go one step further and inject small scripts or behavioral overrides.\nThis is useful for cases like:\nanti-adblock popups autoplay behavior tracking functions pages that keep retrying blocked requests This is more advanced than simple URL blocking, but it is still rule-driven.\nWhere the Rules Come From Most people do not manually write hundreds of filters.\nInstead, extensions subscribe to filter lists such as:\nuBlock filters EasyList EasyPrivacy Peter Lowe\u0026rsquo;s list malware or badware lists regional lists annoyance lists custom user filters A filter list is basically a maintained rule database. The extension downloads it, compiles it, and applies it while you browse.\nThat is why adblockers improve over time without you changing anything manually.\nIn uBlock Origin, the Filter lists pane is where you enable or disable these lists.\nuBlock Origin also uses the EasyList filter syntax, and then extends it with extra syntax for more advanced filtering.\nHow We Usually Customize uBlock Origin If you want to change behavior in uBlock Origin, there are a few common places to do it.\n1. Filter Lists If a site is breaking, sometimes the issue is not your custom rule. Sometimes a list you enabled is too aggressive.\nSo the first place to check is:\nwhich lists are enabled whether you added extra regional or annoyance lists whether one of those lists is causing breakage More lists do not automatically mean better blocking. More lists can also mean more breakage.\n2. My Filters This is where you add your own static filters.\nFor example:\n||video.twimg.com^ or:\nx.com##.some-promoted-container This is the right place for custom network rules and cosmetic rules.\nOne important caution here:\nDo not paste random filters from untrusted sources just because somebody posted them in a comment or a gist. Custom filters can break sites badly, and some trusted-only filters are powerful enough that they should not be copied casually.\n3. Element Picker If the thing you want to remove is visual, the element picker is often the easiest option.\nIt can create a site-specific cosmetic filter for you and save it into My filters.\nThat is usually safer than guessing CSS selectors by hand.\n4. The Logger If you are not sure what got blocked, open the logger.\nThis is one of the most useful parts of uBlock Origin because it shows:\nthe request URL the resource type whether it was blocked which rule matched That is the fastest way to understand why a site feature stopped working.\nWhat Happens When We Block video.twimg.com This part becomes easy once we think in terms of network requests.\nWhen we open x.com, not everything loads from x.com itself.\nSites usually split things across different domains:\nthe main page may come from one host images may come from another host JavaScript may come from a CDN videos may come from a separate media host On X, videos are commonly loaded from video.twimg.com.\nSo when we add a rule like:\n||video.twimg.com^ the extension blocks requests to that host.\nThat means:\nads using that host get blocked but normal video files from that host also get blocked So the page still opens, but the videos do not load.\nThat is why this rule feels like it is blocking ads, but it also ends up blocking normal video playback.\nThe Important Lesson: Host-Level Blocking Is Blunt Blocking an entire host is often effective, but it is also the least precise approach.\nIf a domain is used only for ads or tracking, blocking the whole thing is fine.\nIf a domain is used for core site features, then blocking it can break:\nlogin flows comments embedded media images infinite scroll API calls video.twimg.com falls into that second category. It is used for actual video delivery too.\nHow to Customize Adblock Behavior More Safely If you want to customize a blocker, the safest approach is to make rules as narrow as possible.\n1. Start Narrow, Not Broad Bad first rule:\n||video.twimg.com^ This blocks everything from that host.\nA better rule is one that restricts:\nresource type page domain specific path when possible For example:\n||example-cdn.com/ads/^$script,domain=example.com That is much safer than blocking the whole CDN.\n2. Scope Rules by Resource Type Most blockers let you target only certain request types.\nExamples:\n||example.com^$image ||example.com^$media ||example.com^$script This matters because the same host may serve different kinds of files.\nIf your goal is only to stop media, $media is safer than a blanket host rule. But in the video.twimg.com case, even $media would still block the normal videos you may want to watch.\nSo narrowing the type helps, but it does not solve the core problem if the site uses that host for normal playback.\n3. Scope Rules by Site You can also make a rule apply only on certain pages or sites.\nExample:\n||example-tracker.com^$domain=example.com That means:\nblock this host but only when the page you are visiting is example.com This is useful when the same third-party host appears across many sites, but you only want to block it in one place.\n4. Use Exception Filters to Undo Broad Static Blocking Most blockers support exception filters starting with @@.\nFor example, if you blocked video.twimg.com broadly and want to allow it again on X, an allow rule can look like:\n@@||video.twimg.com^$domain=x.com,media The exact behavior depends on the extension and the rest of your rule set, but the general idea is:\nallow requests to video.twimg.com when they are media requests while browsing x.com This is often the fastest way to recover from an over-broad custom filter.\nIf you are using uBlock Origin\u0026rsquo;s advanced popup and dynamic filtering, be careful with green allow rules there. Those rules can override static filters very broadly.\nFor unbreaking a site during testing, a noop rule is often safer than a broad dynamic allow rule.\n5. Prefer Cosmetic Filters When You Only Want to Hide Something If your real goal is:\nhide a promoted label remove a sidebar module collapse a sponsored card then network blocking may be the wrong tool.\nCosmetic filtering is often safer because it changes presentation, not core resource loading.\nThat is especially true on modern web apps where first-party and third-party resources are mixed together.\n6. Use the Extension Logger Before Writing Permanent Rules Most serious blockers have a logger or request inspector.\nUse it.\nIt tells you:\nwhich URLs were requested which rules matched which requests were blocked which site triggered the request what resource type was involved If you had opened the logger while loading a video on X, you would likely have seen requests to video.twimg.com getting blocked, which immediately explains why the video is not loading.\nThat is much better than guessing.\nIn practice, the logger should come before permanent custom rules.\nFirst inspect.\nThen block.\nThen test.\nCan You Block Ads on X Without Breaking All Video? Sometimes yes, but not always cleanly at the network level.\nThat depends on whether X separates:\npromoted video assets regular user video assets tracking and measurement endpoints If both promoted and normal videos are delivered through the same media host and similar paths, then a host-level rule cannot reliably distinguish them.\nThat is the key limitation of URL-based blocking:\nIf two kinds of content come from the same place, a simple network rule may not be able to separate them.\nIn that situation, better options are often:\na cosmetic rule that hides promoted containers a site-specific rule that targets ad UI rather than media delivery turning off autoplay instead of blocking the video host A Good Mental Model for Custom Rules When you create a custom adblock rule, ask:\nAm I blocking a full host or a specific path? Does this host serve only ads, or also real content? Should this apply everywhere or only on one site? Should this apply to all resource types or only one? Would a cosmetic filter be safer than a network filter? Those five questions prevent a lot of self-inflicted breakage.\nFinal Takeaway Adblock extensions mostly work by matching requests, elements, and behaviors against rule sets.\nThat is why they are both powerful and easy to misconfigure.\nIf you are using uBlock Origin, the practical places to customize behavior are usually:\nFilter lists My filters Element picker Logger When we block:\n||video.twimg.com^ it blocks the ads using that host, but it also blocks the actual video files.\nSo videos on X stop working too.\nThe practical lesson is simple:\nuse the narrowest rule you can prefer site-scoped and type-scoped filters use exception rules when a block is too broad use cosmetic filters when you only want to hide UI That gives you much more control without accidentally breaking half the site.\n","permalink":"https://learncodecamp.net/how-adblock-extensions-work-and-how-to-customize-their-behavior/","summary":"\u003cp\u003eWhen people think about adblock extensions, they usually imagine something simple:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e“The extension sees an ad and hides it.”\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThat is only part of the story.\u003c/p\u003e\n\u003cp\u003eTools like \u003cstrong\u003euBlock Origin\u003c/strong\u003e are better understood as \u003cstrong\u003econtent blockers\u003c/strong\u003e, not just ad blockers.\u003c/p\u003e\n\u003cp\u003eThey do block ads, but they also block:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003etrackers\u003c/li\u003e\n\u003cli\u003epopups\u003c/li\u003e\n\u003cli\u003emalware domains\u003c/li\u003e\n\u003cli\u003eanti-blocker scripts\u003c/li\u003e\n\u003cli\u003eother unwanted page behavior\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eModern blockers such as \u003cstrong\u003euBlock Origin\u003c/strong\u003e mostly work by applying \u003cstrong\u003erules\u003c/strong\u003e to:\u003c/p\u003e","title":"How Adblock Extensions Work and How to Customize Their Behavior"},{"content":"When people talk about transformers, they usually focus on attention, scale, or training data. But one smaller design choice has an outsized effect on model quality:\nHow does the model know where each token appears in the sequence?\nThat question matters because transformers do not understand order by default. Without positional information, a sequence starts to look more like an unordered set of tokens than a structured sentence, paragraph, or program.\nThat becomes a real problem immediately:\ndog bites man is not the same as man bites dog not good is not the same as good code, math, and JSON are highly sensitive to token order One of the most important modern answers to this problem is RoPE, short for Rotary Positional Embedding.\nRoPE became popular because it is mathematically clean, efficient to implement, and especially good at helping attention reason about relative position. Instead of simply attaching a position label to each token, it changes how tokens compare to each other inside attention.\nWhy Transformers Need Positional Encoding In a transformer, each token is projected into three vectors:\nQuery Key Value Attention decides how strongly one token should attend to another by comparing queries and keys.\nIn simplified form:\nattention score(i, j) = q_i · k_j The issue is that this dot product does not tell the model whether token i came before token j, whether they are adjacent, or whether they are far apart. If we do nothing extra, the model has no built-in sense of sequence order.\nThat is why transformers need positional encoding.\nEarlier Positional Encoding Approaches Before RoPE became common, two main strategies were widely used.\nLearned Positional Embeddings The simplest idea is to assign a trainable vector to each position:\nposition 0 gets one embedding position 1 gets another position 2 gets another This works, but it comes with tradeoffs:\nthe model is tied to a maximum trained context length it may generalize poorly beyond the lengths seen during training position is injected in a fairly rigid, absolute way Sinusoidal Positional Embeddings The original transformer paper also introduced fixed sinusoidal encodings:\neach position gets a vector made of sine and cosine values different dimensions use different frequencies no learned position table is needed This was a clever design because it gives positions a structured pattern across scales. But the positional signal is still added to token embeddings before attention, rather than being built directly into the attention comparison itself.\nThat leads to the key question:\nCan positional information be injected directly into attention?\nRoPE answers yes.\nWhat Is RoPE? Rotary Positional Embedding applies a position-dependent rotation to the query and key vectors before attention is computed.\nInstead of saying:\ntoken embedding + position embedding RoPE says:\nrotate parts of q and k based on token position, then compute attention This creates an important effect:\nThe attention score between a query at position m and a key at position n becomes sensitive to their relative distance, not just their content.\nThat is the core reason RoPE is so useful.\nIntuition: Position as Rotation RoPE groups dimensions into pairs and treats each pair like a tiny 2D coordinate.\nFor example, a vector can be viewed as:\n(x1, x2) (x3, x4) (x5, x6) For a token at position p, each pair is rotated by an angle that depends on:\nthe token position p the frequency assigned to that pair So:\na token at position 5 is rotated a little a token at position 50 is rotated more some dimensions rotate slowly some rotate faster The rotation changes direction but preserves magnitude. That means position becomes part of the geometry of the query-key comparison instead of just an extra tag added to the embedding.\nWhy Rotation Helps Attention Attention uses dot products between queries and keys. RoPE rotates both of them before the score is computed:\nscore(i, j) = rotate(q_i, pos_i) · rotate(k_j, pos_j) The useful property is this:\nThe resulting score depends on the offset between positions.\nIn other words, RoPE makes attention naturally care about relationships like:\nthe previous token the next token a token a few steps away a matching bracket much later in the sequence That is often more useful than absolute position. Language, code, and structured data rely heavily on relative relationships.\nFor example:\npronouns often refer to nearby nouns modifiers usually attach to nearby words brackets and quotes must match across spans code variables often reappear shortly after declaration RoPE gives attention a built-in bias toward learning those patterns.\nThe Core Idea in Simple Mathematical Terms Take a pair of dimensions and rotate it with a 2D rotation:\nrotate(x, y, theta) = (x * cos(theta) - y * sin(theta), x * sin(theta) + y * cos(theta)) For a token at position p, the angle is:\ntheta_i(p) = p * omega_i Where:\ni identifies the dimension pair omega_i is that pair\u0026rsquo;s frequency This gives RoPE a multi-scale structure:\nlow-frequency pairs change slowly across positions high-frequency pairs change quickly across positions That is similar in spirit to sinusoidal encodings: different frequencies let the model represent positional information at different granularities.\nIn practice, the model:\nprojects hidden states into queries and keys splits dimensions into pairs computes sine and cosine terms for each position rotates each query/key pair uses the rotated vectors in attention A simplified version looks like this:\ndef rotate_pair(x1, x2, cos_theta, sin_theta): return ( x1 * cos_theta - x2 * sin_theta, x1 * sin_theta + x2 * cos_theta, ) Real implementations are fully vectorized, but conceptually that is the entire trick.\nAnother Intuition: RoPE as Complex Numbers There is an elegant alternative way to think about RoPE.\nIf you treat a pair of dimensions (x, y) as a complex number:\nz = x + iy Then rotating by an angle theta is the same as multiplying by:\ne^(i * theta) So RoPE can be understood as applying a position-dependent phase shift to each pair of dimensions.\nThat viewpoint makes the method feel especially neat:\nposition becomes phase attention becomes phase-aware comparison relative offsets emerge from how those phases interact You do not need the complex-number interpretation to use RoPE, but it helps explain why many people find it mathematically elegant.\nWhy RoPE Became So Popular RoPE spread quickly because it combines practical benefits with a strong inductive bias.\n1. It captures relative position naturally Many sequence patterns are really about distance and order, not absolute token index.\n2. It fits cleanly into transformer attention RoPE only changes how queries and keys are prepared before attention. The rest of the transformer stays mostly the same.\n3. It is parameter-efficient It does not need a large learned table of position embeddings.\n4. It often extrapolates better than simple absolute embeddings RoPE is not a magic long-context solution, but in practice it often behaves better than learned absolute position embeddings when the sequence grows beyond the training window.\n5. It became a strong ecosystem default Once influential model families adopted RoPE or RoPE-style variants, it became a standard choice across many LLM implementations.\nRoPE vs Additive Positional Embeddings It helps to compare the mental models directly.\nAdditive positional embeddings These say:\nx_p = token_p + position_p The model gets positional information because each token representation includes a position vector.\nRoPE RoPE says:\nposition changes how q and k interact inside attention This difference matters because attention is where token-to-token relationships are actually computed.\nYou can think of it like this:\nadditive embeddings say: this token is at position 17 RoPE says: when token A compares itself with token B, position changes that comparison That is often the more useful bias.\nWhy RoPE Is Applied to Queries and Keys, Not Values RoPE is usually applied to queries and keys, not values.\nThat choice is deliberate.\nQueries and keys determine the attention weights, which answer the question:\nWho should attend to whom?\nValues are the content being aggregated after those weights are already decided.\nSo if position mainly matters for determining relationships, rotating queries and keys is usually enough. Rotating values tends to add complexity without delivering the same clear benefit.\nHow RoPE Handles Short-Range and Long-Range Structure Because RoPE uses multiple frequencies, it can encode position at different scales.\nhigh-frequency components are sensitive to small positional changes low-frequency components vary more slowly and capture coarser structure That gives the model useful signals for both:\nlocal syntax broader sequence structure A simple intuition:\npositions 10 and 11 produce only a small angular difference positions 10 and 110 produce a much larger one So nearby tokens and distant tokens do not just differ by content. They also differ geometrically in a way the model can learn to exploit.\nRoPE and Long Context Windows RoPE is often associated with long-context LLMs, but that needs some nuance.\nIt is true that RoPE often generalizes better than simple learned absolute position embeddings. But it does not solve long-context reasoning on its own.\nAs positions get very large:\nthe rotations continue indefinitely phase behavior becomes harder to use reliably performance can degrade outside the range seen during training That is why long-context systems often extend or modify RoPE with techniques such as:\nposition interpolation NTK-aware scaling YaRN-style scaling other context-extension methods So RoPE is best understood as a strong foundation, not a complete solution to long-context modeling.\nLimitations of RoPE RoPE is excellent, but it is not perfect.\n1. Extrapolation is still limited It often works better than learned absolute embeddings beyond training length, but only to a point.\n2. Very large positions can become harder to distinguish cleanly Because the encoding is based on periodic rotations, extremely long ranges can become less stable without additional tricks.\n3. Frequency design still matters The chosen frequency schedule affects how position is distributed across dimensions.\n4. It is still a hand-designed inductive bias RoPE is elegant and effective, but it is not the only possible positional scheme. Researchers continue exploring alternatives and adaptive methods.\nRoPE vs ALiBi RoPE is not the only important modern positional method.\nALiBi is another well-known approach. Instead of rotating vectors, ALiBi adds a distance-based bias directly to attention scores.\nAt a high level:\nRoPE injects position through vector rotation ALiBi injects position through attention-score bias Both approaches aim to help transformers handle order better, but they make different tradeoffs. RoPE became especially dominant because it integrated cleanly into high-performing transformer recipes and worked well in practice at scale.\nA Plain-English Explanation If you want the shortest useful summary, it is this:\nRoPE tells a transformer where tokens are by rotating parts of its query and key vectors according to position, so attention becomes sensitive to how far apart tokens are.\nThat is the idea in one sentence.\nWhy This Small Detail Matters Positional encoding can sound like a minor implementation choice. It is not.\nIf a transformer handles position poorly, it may:\nstruggle with order-sensitive reasoning mis-handle syntax and structured data generalize poorly to longer sequences waste capacity learning positional patterns inefficiently RoPE improves one of the most important operations in the model: how tokens compare to other tokens across a sequence.\nThat is why such a small-looking architectural choice has such large downstream effects.\nFinal Takeaway RoPE, or Rotary Positional Embedding, injects positional information into transformers by rotating query and key vectors according to token position before attention is computed.\nIts main strengths are:\nit encodes position directly inside attention it naturally supports relative-position reasoning it is parameter-efficient it works well in practice it became a standard building block in modern LLMs The deeper point is that sequence modeling is not just about understanding individual tokens. It is about understanding how tokens relate to each other across order and distance.\nIf attention is the engine of a transformer, RoPE is one of the mechanisms that helps it stay oriented on the road.\n","permalink":"https://learncodecamp.net/rope-explained/","summary":"\u003cp\u003eWhen people talk about transformers, they usually focus on attention, scale, or training data. But one smaller design choice has an outsized effect on model quality:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHow does the model know where each token appears in the sequence?\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThat question matters because transformers do not understand order by default. Without positional information, a sequence starts to look more like an unordered set of tokens than a structured sentence, paragraph, or program.\u003c/p\u003e","title":"RoPE Explained: The Positional Encoding Trick Behind Modern Language Models"},{"content":"Large Language Models (LLMs) such as GPT-2, GPT-3, LLaMA, and BERT are built on top of the Transformer architecture. That architecture changed natural language processing by replacing recurrence with attention, which lets models process sequences more efficiently and capture long-range relationships more directly.\nIf you are trying to understand what terms like layer, transformer block, and attention head actually mean, the easiest way is to follow the path a sentence takes through a GPT-style model.\nOne terminology note before we begin: in most GPT-style model specifications, a layer usually means one full transformer block. So if a model is described as having 48 layers, that usually means it has 48 stacked transformer blocks.\nGPT-2 XL as a concrete example: 48 transformer blocks, 25 attention heads, 1,600-dimensional token embeddings, a 6,400-dimensional MLP expansion, and a 1,024-token context window.\nThe High-Level Pipeline At a high level, a transformer language model processes text like this:\nInput text -\u0026gt; Tokenization -\u0026gt; Token embeddings + positional information -\u0026gt; N stacked transformer blocks -\u0026gt; Final hidden state -\u0026gt; Linear projection to vocabulary -\u0026gt; Softmax -\u0026gt; Next-token probabilities The core idea is simple: each block refines the representation of every token, and the final layer turns those representations into a probability distribution over the vocabulary.\n1. Tokenization: Turning Text Into Model Inputs Before text enters the model, it is broken into tokens.\nFor example, the sentence:\nThe cat sat on the mat\nmight be represented as whole-word-like tokens, or as smaller subword pieces depending on the tokenizer. Each token is then mapped to a numeric token ID.\nThis matters because the model never sees raw text directly. It only sees a sequence of token IDs.\n2. Embedding Layer Each token ID is converted into a dense vector called an embedding.\nIf a model has a hidden size of d_model = 1600, then every token becomes a 1,600-dimensional vector.\nFor GPT-2 XL, that 1,600-dimensional vector is the model\u0026rsquo;s base representation size. The rest of the network keeps transforming vectors of this size as the text moves upward through the stack.\nThese embeddings are learned during training, which is why tokens with related meanings often end up with related vector patterns.\n3. Positional Information Transformers process tokens in parallel, so they need some way to represent order.\nThat is why the model adds positional information to token embeddings.\nConceptually:\ninput_representation = token_embedding + positional_information\nOlder GPT-style models such as GPT-2 use learned absolute positional embeddings. Many newer LLMs use alternatives such as RoPE instead. The purpose is the same: the model must know the difference between:\ndog bites man man bites dog without relying on recurrence.\n4. Transformer Blocks: The Core of the Model After embeddings are prepared, they pass through many stacked transformer blocks:\nEmbeddings -\u0026gt; Block 1 -\u0026gt; Block 2 -\u0026gt; Block 3 -\u0026gt; ... -\u0026gt; Block N Each block takes the previous representation and produces a more contextual one.\nTypical model depths look like this:\nModel Approximate Layers GPT-2 XL 48 GPT-3 96 LLaMA-2 32 to 80 The deeper the stack, the more opportunities the model has to refine syntax, relationships, and abstract meaning.\n5. What Is Inside a Transformer Block? A GPT-style transformer block typically contains:\nLayer normalization Masked multi-head self-attention A residual connection Another layer normalization A feed-forward network, often called an MLP Another residual connection A simplified flow looks like this:\nInput -\u0026gt; LayerNorm -\u0026gt; Multi-Head Self-Attention -\u0026gt; Residual Add -\u0026gt; LayerNorm -\u0026gt; Feed-Forward Network -\u0026gt; Residual Add -\u0026gt; Output This pattern repeats for every block in the model.\n6. Self-Attention: How Tokens Look at Other Tokens Self-attention is the mechanism that lets each token decide which other tokens matter.\nConsider the sentence:\nThe animal didn't cross the street because it was tired.\nTo interpret it, the model needs to connect it to animal. Attention gives the model a way to learn that relationship.\nEach token is projected into three vectors:\nQuery (Q): what this token is looking for Key (K): what this token can offer Value (V): the information this token contributes The standard attention formula is:\nAttention(Q, K, V) = softmax((QK^T) / sqrt(d_k))V\nIn plain English:\ncompare each query with all keys turn those scores into weights use the weights to mix the value vectors That produces a contextual representation for each token.\n7. Multi-Head Attention Instead of computing attention once, transformers compute it multiple times in parallel using attention heads.\nEach head gets a different learned projection of the same input, so different heads can specialize in different patterns.\nPossible behaviors include:\nHead Possible Focus Head 1 local syntax Head 2 subject-verb agreement Head 3 pronoun resolution Head 4 long-range dependencies These roles are not fixed by design, but they are a useful intuition for why multiple heads help.\n8. Why Hidden Size and Head Count Must Match The model\u0026rsquo;s hidden dimension is split across attention heads:\nhidden_size = number_of_heads x head_dimension\nFor GPT-2 XL:\nhidden_size = 1600 number_of_heads = 25 head_dimension = 1600 / 25 = 64 So each head works on a 64-dimensional slice, and the outputs of all 25 heads are concatenated back together into the original 1,600-dimensional space.\nThis is why head count is not arbitrary. It must divide cleanly into the model\u0026rsquo;s hidden size.\n9. The Feed-Forward Network (MLP) After attention, each token passes through a feed-forward network. This is often the second major component inside each transformer block.\nThe usual structure is:\nLinear -\u0026gt; Activation -\u0026gt; Linear For GPT-2 XL, the MLP expands the 1,600-dimensional representation to a larger internal size and then projects it back down:\n1600 -\u0026gt; 6400 -\u0026gt; 1600\nIn many modern models, the activation function is GELU or SwiGLU.\nUnlike attention, which mixes information across tokens, the MLP operates independently on each token position. Its job is to add nonlinear transformation capacity after attention has gathered context.\n10. Residual Connections and Layer Normalization Residual connections are critical in deep transformers.\nThe idea is:\noutput = sublayer(x) + x\nThis helps because it:\nstabilizes optimization improves gradient flow makes very deep networks trainable Layer normalization helps keep activations well-behaved as they move through dozens of stacked blocks.\nWithout residual connections and normalization, modern LLMs would be much harder to train reliably.\n11. The Output Layer After the final transformer block, the model produces a hidden state for each token position. To predict the next token, it takes the final hidden state for the current position and projects it into vocabulary space.\nThe flow is:\nFinal hidden state -\u0026gt; Linear projection -\u0026gt; Softmax -\u0026gt; Probability distribution over the vocabulary For GPT-2 XL, that vocabulary size is 50,257 tokens.\nThe token with the highest probability may be selected, or decoding strategies such as sampling, top-k, or nucleus sampling may be used instead.\n12. Autoregressive Generation GPT-style models are autoregressive. They generate one token at a time.\nIf the prompt is:\nThe capital of France is\nthe model predicts the next token, such as:\nParis\nThen that new token is appended to the sequence, and the model predicts again.\nSo generation works like this:\nToken 1 -\u0026gt; Token 2 -\u0026gt; Token 3 -\u0026gt; ... This is why inference is sequential across generated tokens, even though much of the computation inside each step is highly parallel.\n13. What Runs Sequentially and What Runs in Parallel? This distinction is one of the most important ideas in transformer systems.\nSequential Parts Some parts cannot be parallelized across depth or generation steps:\nStacked transformer blocks Block 2 needs the output of Block 1, so the blocks run one after another.\nAutoregressive decoding When generating text, the model must produce the next token before it can produce the one after that.\nParallel Parts A lot still happens in parallel inside each step:\nAttention heads All heads in a multi-head attention module run in parallel.\nToken computations during training and prefill Tokens in the input sequence are processed in parallel inside a block.\nMLP computation across tokens The feed-forward network is applied independently to each token position, which makes it highly parallelizable.\nA simplified picture looks like this:\nTokens in a sequence (parallel) -\u0026gt; Transformer Block 1 -\u0026gt; Attention heads (parallel) -\u0026gt; Token positions (parallel) -\u0026gt; MLP on each token (parallel) -\u0026gt; Transformer Block 2 -\u0026gt; ... -\u0026gt; Output probabilities This combination of sequential depth and massive internal parallelism is a big reason transformers scale so well on GPUs.\n14. Why Stacking Many Layers Works Different layers often capture different kinds of information.\nA common intuition is:\nLayer Region Tends to Emphasize Early layers local patterns, token identity, short-range syntax Middle layers phrase structure, dependencies, compositional relationships Later layers higher-level semantics, task signals, prediction-ready features This is not a hard rule, but it is a useful mental model. Each block refines what the model knows about every token by mixing context and applying nonlinear transformations again and again.\n15. A Quick Note on BERT vs GPT-Style Models BERT and GPT both use transformer blocks, but they differ in how attention is applied:\nBERT uses bidirectional attention, so tokens can attend to both left and right context. GPT-style models use causal masking, so tokens can attend only to earlier positions when predicting the next token. That difference is one reason BERT is mainly used for understanding tasks, while GPT-style models are naturally suited for generation.\nFinal Takeaway The internal structure of an LLM is complex, but the main idea is elegant:\ntext becomes token IDs token IDs become embeddings embeddings pass through many transformer blocks each block applies attention and an MLP the final representation is projected into vocabulary probabilities Once you understand layers, transformer blocks, attention heads, hidden dimensions, and execution parallelism, the architecture of modern LLMs becomes much easier to reason about.\nThat foundation also makes it easier to study more advanced topics such as scaling laws, KV cache design, inference optimization, long-context attention, and model interpretability.\n","permalink":"https://learncodecamp.net/llm-architecture-layers-transformer-blocks-attention-heads/","summary":"\u003cp\u003eLarge Language Models (LLMs) such as GPT-2, GPT-3, LLaMA, and BERT are built on top of the \u003cstrong\u003eTransformer\u003c/strong\u003e architecture. That architecture changed natural language processing by replacing recurrence with attention, which lets models process sequences more efficiently and capture long-range relationships more directly.\u003c/p\u003e\n\u003cp\u003eIf you are trying to understand what terms like \u003cstrong\u003elayer\u003c/strong\u003e, \u003cstrong\u003etransformer block\u003c/strong\u003e, and \u003cstrong\u003eattention head\u003c/strong\u003e actually mean, the easiest way is to follow the path a sentence takes through a GPT-style model.\u003c/p\u003e","title":"Understanding LLM Architecture: Layers, Transformer Blocks, and Attention Heads"},{"content":"If you are building a RAG system, internal knowledge assistant, or document search chatbot, one question matters more than almost anything else:\nWhen the answer is supposed to come from the provided documents, how often does the model still make things up?\nThat is exactly what the March 9, 2026 paper “How Much Do LLMs Hallucinate in Document Q\u0026amp;A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms” tries to measure.\nThe short answer is uncomfortable:\nEven the best model in the study still fabricated answers 1.19% of the time at 32K context. Strong models often landed in the 5% to 7% range. The median model was closer to 25%. At 200K context, no tested model stayed below 10% fabrication. That makes this paper useful not because it says hallucinations exist. We already know that. It is useful because it puts numbers on the problem at a much larger scale than most benchmark discussions.\nWhat the Paper Studied The author evaluated 35 open-weight models across:\n3 context lengths: 32K, 128K, and 200K 4 temperatures: 0.0, 0.4, 0.7, and 1.0 3 hardware platforms: NVIDIA H200, AMD MI300X, and Intel Gaudi3 In total, the study used more than 172 billion tokens across more than 4,000 runs.\nThe focus was not open-ended generation. It was a narrower and more practical setting:\nGive the model documents Ask questions grounded in those documents Measure whether the answer is correct Measure whether the model invents facts that are not present That last part is important. Many evaluations only test whether a model can retrieve or summarize information that exists. This paper also tested whether the model would confidently answer questions about things that do not exist in the documents.\nWhy This Benchmark Is Interesting The paper uses a method called RIKER.\nInstead of starting with real-world documents and paying humans to annotate the correct answers, the evaluation starts with structured ground truth first. Documents are then generated from that known ground truth. That means the benchmark already knows exactly:\nWhat facts exist What facts do not exist Which answers should be refused This design helps the paper avoid three common benchmark problems:\nBenchmark contamination: models may already have seen static benchmark data during training LLM-as-judge bias: another model is often used to grade the output, which introduces its own errors Small sample sizes: many evaluations are too small to be statistically convincing You can debate whether one framework captures every real-world behavior, but the general setup is strong for measuring fabrication in document-grounded QA.\nThe Most Important Findings 1. Hallucination Does Not Go to Zero This is the headline result.\nUnder the best conditions tested, the best model still fabricated answers 1.19% of the time at 32K context. That may sound small, but in production it is not.\nIf your application handles:\n10,000 document-grounded questions per day Then a 1.19% fabrication rate would still imply roughly:\n119 fabricated answers per day And that is the best-case result from the paper, not the average one.\nThe more realistic takeaway is that many supposedly strong models still hallucinate often enough that you cannot treat their answers as automatically trustworthy.\n2. Longer Context Windows Make Things Worse One of the clearest patterns in the paper is that hallucination gets worse as context length increases.\nAt 32K, a handful of models stayed below 10% fabrication.\nAt 128K, only 5 of 26 tested models remained below 10% fabrication.\nAt 200K, none of the tested models did.\nThis matters because many teams assume that if a model advertises a very large context window, then it can reliably reason over that entire context. The paper argues that this is the wrong mental model.\nAdvertised context length is not the same as usable context length.\nIn other words:\nA model may technically accept 200K tokens That does not mean it can answer reliably across 200K tokens It may degrade badly or fabricate far more often That is a direct warning for “just stuff more documents into the prompt” style RAG systems.\n3. Model Family Matters More Than Raw Size A useful result from the paper is that bigger is not automatically safer.\nSome model families consistently fabricated less than others, and that pattern held better than simple parameter count comparisons.\nHere is a compact comparison table based on the paper\u0026rsquo;s reported best-case numbers:\nModel 32K Overall Accuracy 32K Fabrication 128K Overall Accuracy 128K Fabrication 200K Overall Accuracy 200K Fabrication GLM 4.5 97.40% 1.19% 87.43% 3.19% Not tested Not tested MiniMax M2.1 95.96% 5.06% 85.59% 9.72% Not tested Not tested DeepSeek V3.1 95.49% 6.35% 90.45% 7.36% Not tested Not tested Qwen3 Next 80B-A3B 93.87% 7.04% 87.85% 7.99% 82.68% 10.25% GLM 4.6 93.26% 7.04% 85.81% 13.75% 37.65% 69.53% Llama 4 Maverick 86.52% 28.08% 63.90% 38.82% 61.56% 43.29% Llama 3.1 405B 84.75% 26.51% 58.29% 30.62% Not tested Not tested Llama 3.1 70B 69.76% 49.50% 42.08% 56.67% Not tested Not tested This table highlights three patterns fast:\nGLM 4.5 is the standout at 32K Qwen3 Next 80B-A3B is the most resilient model among those tested all the way to 200K GLM 4.6 shows how a strong 32K model can collapse badly at very long context lengths For example, the paper reports that:\nSome GLM and MiniMax models had relatively low fabrication rates Several Llama-family models showed much higher fabrication even when they had strong grounding scores That suggests hallucination resistance is not simply an emergent property of scale. It looks more like a capability that depends heavily on training choices, alignment, and calibration.\nFor practitioners, the implication is simple:\nDo not pick a model just because it is bigger Do not pick a model just because it performs well on general benchmarks Test the exact document-QA behavior you care about 4. Temperature 0 Is Not Always the Best Choice A lot of teams default to temperature = 0 for factual tasks because it feels safer and more deterministic.\nThis paper shows that the rule is not that simple.\nAccording to the results:\nT=0.0 gave the best overall accuracy in about 60% of cases Higher temperatures reduced fabrication for the majority of model-context combinations T=0.0 also increased coherence failures, including infinite generation loops, especially at long context lengths One especially striking result in the paper is that some models had dramatically higher loop or truncation rates at T=0.0 than at T=1.0, with extreme cases showing tens of times more failures.\nThe practical message is not “always use temperature 1.0.” It is:\nDo not blindly assume temperature 0 is optimal For a real system, you may need to tune for a balance between:\naccuracy fabrication rate response stability 5. Hardware Did Not Meaningfully Change Fidelity The paper also compared the same models across:\nNVIDIA H200 AMD MI300X Intel Gaudi3 The main conclusion was that hardware platform did not meaningfully change the models’ fidelity behavior.\nThat is good news for deployment planning. If the same serving stack is used, hardware choice appears to be more about:\ncost throughput availability and less about answer quality.\nA Very Important Distinction: Grounding vs Fabrication One of the best ideas in the paper is that grounding ability and fabrication resistance are not the same thing.\nA model can be good at finding facts that really exist in the provided documents and still be bad at refusing to answer when the requested fact is not there.\nThat means a model can look good on retrieval-heavy benchmarks but still be risky in production.\nThis is a major point for anyone evaluating RAG systems. If your benchmark only asks:\n“Can the model find the right answer when the answer exists?” then you are missing the harder and more dangerous question:\n“What does the model do when the answer does not exist in the retrieved context?” That is where real trustworthiness is tested.\nWhat This Means for RAG and Enterprise AI If you deploy document-grounded LLM systems, this paper points to a few practical rules.\nTreat Hallucination as a Product Constraint, Not a Rare Bug The paper’s numbers are too high to dismiss as edge cases. Even top-tier models produce fabricated answers often enough that you need system-level defenses.\nThat can include:\nanswer citation requirements refusal behavior when evidence is weak retrieval quality checks confidence thresholds human review for high-stakes workflows Test at Your Real Context Length If your production system regularly sends 80K, 120K, or 200K tokens, then a 32K benchmark is not enough. The paper shows that performance at shorter context lengths can give false confidence.\nMeasure Refusal Quality Explicitly A good evaluation set should include questions where:\nthe answer is absent the entity is missing the relationship is fake If you do not test those cases, you are mostly measuring retrieval and summarization, not hallucination resistance.\nStop Using “Bigger Model” as a Shortcut for Safety The paper makes it clear that some smaller or mid-sized models can be better calibrated than much larger ones for document-grounded QA.\nLimitations to Keep in Mind This is a useful paper, but it is still one study and it has clear limits:\nIt evaluates open-weight models, not proprietary systems like GPT, Claude, or Gemini It focuses on English It measures one framework, RIKER It is specifically about document Q\u0026amp;A, not every type of LLM task So the exact rankings should not be treated as universal truth. But the broader patterns are hard to ignore:\nhallucination floors are real long context makes reliability harder temperature tuning is more nuanced than people assume retrieval success does not guarantee refusal quality Final Takeaway This paper answers an important practical question with unusually concrete numbers:\nLLMs hallucinate in document Q\u0026amp;A more often than most teams would be comfortable admitting, and the problem gets worse as context grows.\nIf you are building RAG or enterprise knowledge systems, the lesson is not to abandon LLMs. The lesson is to stop evaluating them with shallow metrics.\nYou need to test:\nwhether the model finds the right answer whether it refuses when the answer is missing how it behaves at your actual production context length whether decoding choices make reliability better or worse That is a much higher bar than “it looked good in a demo,” but this paper makes a strong case that the higher bar is necessary.\nSource Paper: How Much Do LLMs Hallucinate in Document Q\u0026amp;A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms ","permalink":"https://learncodecamp.net/how-much-do-llms-hallucinate-document-qa/","summary":"\u003cp\u003eIf you are building a RAG system, internal knowledge assistant, or document search chatbot, one question matters more than almost anything else:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhen the answer is supposed to come from the provided documents, how often does the model still make things up?\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThat is exactly what the March 9, 2026 paper \u003cstrong\u003e“How Much Do LLMs Hallucinate in Document Q\u0026amp;A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms”\u003c/strong\u003e tries to measure.\u003c/p\u003e","title":"How Much Do LLMs Hallucinate in Document Q\u0026A? Key Lessons from a 172B-Token Study"},{"content":"The CAP theorem is one of the most important ideas in distributed systems because it explains why “just make it always correct and always online” is not a realistic requirement once multiple nodes and unreliable networks enter the picture.\nIn simple terms, CAP says that when a network partition happens, a distributed system can prioritize either:\nConsistency Availability But not both at the same time.\nPartition tolerance is not a feature you casually add or remove. If your system runs across multiple machines, partitions are a fact of life, so the real design choice is usually CP vs AP.\nWhat CAP Stands For Consistency Every client sees the same data at the same time after a write completes.\nIf one node accepts a write, another node should not return an older value as if that write never happened.\nAvailability Every request receives a non-error response, even if that response may not contain the latest data.\nAvailability in CAP is stricter than “the system is usually up.” It means the system continues to answer requests without simply timing out or refusing them.\nPartition Tolerance The system continues operating even when nodes cannot communicate with one another because of dropped packets, broken links, slow networks, or a full region-level isolation event.\nWhy the Tradeoff Appears Imagine two replicas holding the same account balance. A client writes to Replica A, but Replica A cannot reach Replica B because the network is split.\nAt that point the system has two options:\nWait or reject requests until replicas can coordinate again. Keep accepting requests on both sides and reconcile later. The first path preserves consistency but sacrifices availability. The second path preserves availability but risks inconsistent reads or conflicting writes.\nA Partition Scenario This is the moment where the theorem becomes practical instead of theoretical:\nIf you are building a banking ledger, rejecting a write is often better than accepting contradictory balances.\nIf you are building a social feed, serving slightly stale data is usually better than showing an error page to everyone.\nThat is the heart of CAP: the right answer depends on the product requirement.\nUnderstanding CP Systems A CP system chooses consistency and partition tolerance.\nDuring a partition, it may:\nReject writes Return errors for some requests Become read-only Wait for a leader or quorum before responding This is a good fit when correctness matters more than immediate responsiveness.\nCommon examples:\nPayment and ledger systems Inventory management Metadata stores Systems using leader election and quorum writes The tradeoff is user-visible unavailability during failures.\nUnderstanding AP Systems A AP system chooses availability and partition tolerance.\nDuring a partition, it may:\nKeep serving reads and writes from both sides Accept that replicas may diverge temporarily Resolve conflicts later Use eventual consistency instead of immediate consistency This is a good fit when the business values continuous service more than perfectly synchronized state at every instant.\nCommon examples:\nSocial timelines DNS-style systems Product catalog browsing Shopping carts and recommendation systems The tradeoff is stale reads or reconciliation complexity.\nWhat About CA Systems? People often talk about CA systems, but in practice that label mostly applies to systems that do not have to survive real network partitions across multiple nodes.\nFor example, a single-node relational database can often behave like CA from the application’s point of view:\nStrong consistency on one machine High availability as long as that machine stays healthy But once you distribute the system, partition tolerance stops being optional.\nCAP Profiles at a Glance CAP Is Not the Whole Story CAP is useful, but it is also easy to oversimplify.\nA few important clarifications:\nCAP only becomes interesting during a partition. Many systems behave like they are strongly consistent most of the time and only expose tradeoffs during failures. Real architectures use more knobs than CAP alone: quorum sizes, leader election, retries, idempotency, conflict resolution, replication lag, and client-side UX fallbacks. That is why experienced engineers treat CAP as a starting point for reasoning, not the final word on distributed system design.\nPractical Design Questions to Ask When choosing between CP and AP behavior, ask:\nWhat is worse for the user: an error now or stale/conflicting data now? Can the business safely reconcile data later? Which operations require strict correctness, and which can tolerate eventual consistency? Do all endpoints need the same guarantees, or can some be CP while others are AP? In many real systems, the answer is mixed:\nPayment authorization might be CP. Activity feeds might be AP. Search indexes might lag behind the source of truth. Final Takeaway The CAP theorem does not say you can pick any two properties all the time.\nThe practical reading is:\nIn a distributed system, when a partition happens, you usually have to choose between consistency and availability.\nThat single sentence explains many design decisions in modern databases, queues, caches, and replicated services.\nIf you understand CAP clearly, you make better choices about failure handling instead of assuming the network will always behave.\n","permalink":"https://learncodecamp.net/cap-theorem-explained/","summary":"\u003cp\u003eThe \u003cstrong\u003eCAP theorem\u003c/strong\u003e is one of the most important ideas in distributed systems because it explains why “just make it always correct and always online” is not a realistic requirement once multiple nodes and unreliable networks enter the picture.\u003c/p\u003e\n\u003cp\u003eIn simple terms, CAP says that when a \u003cstrong\u003enetwork partition\u003c/strong\u003e happens, a distributed system can prioritize either:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eConsistency\u003c/strong\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eAvailability\u003c/strong\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eBut not both at the same time.\u003c/p\u003e\n\u003cp\u003ePartition tolerance is not a feature you casually add or remove. If your system runs across multiple machines, partitions are a fact of life, so the real design choice is usually \u003cstrong\u003eCP vs AP\u003c/strong\u003e.\u003c/p\u003e","title":"CAP Theorem Explained: Consistency vs Availability in Distributed Systems"},{"content":"Attention is the idea that made modern transformers practical and powerful. Instead of compressing an entire input into one fixed vector, a model can decide, token by token, which earlier pieces of information matter most right now.\nThat sounds simple, but there are many different kinds of attention mechanisms, and they exist because models face different constraints:\nsome need strong alignment between an encoder and a decoder some need to generate text one token at a time without looking ahead some need to handle very long documents some need to reduce GPU memory traffic at inference time This article walks through the main families of attention, shows where they fit, and explains why newer variants such as DeepSeek\u0026rsquo;s multi-head latent attention (MLA) matter.\nA practical timeline: attention started as an alignment mechanism for sequence-to-sequence models, then became the core compute pattern inside transformers, and is now being redesigned for long-context and low-latency inference.\nWhy Attention Matters Before transformers, sequence models such as RNNs and LSTMs processed text step by step. They were useful, but they struggled to keep distant information alive over long spans. If a model had to connect a pronoun to a noun many words earlier, or tie the end of a long paragraph back to the beginning, that fixed-memory bottleneck became a problem.\nAttention changed this by turning memory lookup into a learned operation. Instead of asking the network to remember everything in one hidden state, attention lets the current token selectively read from many token positions.\nIn plain language:\nthe current token asks a question earlier tokens advertise what information they contain the model retrieves a weighted mixture of the most relevant information That is the heart of attention.\nThe Core Idea: Query, Key, and Value The modern transformer version of attention is usually written as:\nAttention(Q, K, V) = softmax(QK^T / sqrt(d_k))V\nYou do not need the notation to get the idea:\nQuery (Q) represents what the current token is looking for Key (K) represents what each token can offer Value (V) is the actual content that will be mixed into the output The query is compared with all keys. High similarity means high relevance. After normalization with softmax, the model uses those weights to combine the value vectors.\nScaled dot-product attention in one picture: token embeddings are projected into queries, keys, and values; scores are computed from query-key similarity; the resulting weights mix the value vectors into contextual outputs.\nA Short History of Attention 1. Additive Attention (Bahdanau Attention) One of the most influential early uses of attention appeared in sequence-to-sequence machine translation. In the 2014 Bahdanau paper, the decoder did not rely on a single final encoder state. Instead, for each output word, it learned a soft alignment over encoder states.\nWhy it mattered:\nit improved translation quality it made long sequences easier to handle it made alignment between source and target words more explicit This mechanism is often called additive attention because the score is produced by a learned feed-forward function over encoder and decoder states rather than a raw dot product.\n2. Multiplicative / Dot-Product Attention (Luong Attention) Luong-style attention simplified scoring by using dot products or closely related forms. It was faster and easier to scale than additive attention, especially as vectorized linear algebra became the dominant implementation style.\nThis family sits conceptually between early seq2seq attention and the fully transformer-native attention used later.\nTransformer Attention: The Variants You Must Know The transformer did not invent attention, but it made attention the central compute primitive in the architecture.\nSelf-Attention In self-attention, the queries, keys, and values all come from the same sequence.\nIf the input is:\nThe flag is blue\nthen each token can look at the other tokens in that same sentence. The token flag can attend to The, is, and blue, and the final representation for flag becomes contextual rather than isolated.\nWhy self-attention is powerful:\nevery token can directly access every other token long-range interactions no longer require many recurrent steps the whole sequence can be processed in parallel during training Cross-Attention In cross-attention, queries come from one sequence while keys and values come from another.\nThis is the classic pattern in encoder-decoder models:\nthe encoder builds representations of the source sentence the decoder uses cross-attention to read from the encoded source while generating the target sentence Cross-attention also appears outside translation:\nretrieval-augmented generation image-text models audio-text models multi-document or multi-modal fusion Causal or Masked Self-Attention For autoregressive language models, a token cannot be allowed to see future tokens during training. Otherwise the model would cheat.\nThat is why GPT-style decoders use causal attention:\ntoken 1 can attend only to token 1 token 2 can attend to tokens 1 and 2 token 3 can attend to tokens 1, 2, and 3 Mathematically, this is handled by masking the upper-right triangle of the attention score matrix before the softmax.\nSelf-attention reads within one sequence. Cross-attention reads from a different sequence. Causal attention applies a future-token mask so next-token prediction remains valid.\nMulti-Head Attention One attention pattern is useful, but a single pattern is limiting. That is why transformers use multi-head attention.\nInstead of learning one query, key, and value projection, the model learns many heads in parallel. Each head gets its own projection space and can specialize:\none head might focus on nearby syntax another might focus on coreference another might track delimiters, list structure, or repeated phrases The heads are then concatenated and projected back into the model dimension.\nWhy multi-head attention works so well:\nit increases representational diversity it lets different relation types coexist it is still easy to implement efficiently on accelerators This is the dominant form in vanilla transformers, but it comes with a cost: each head usually stores its own key and value cache during decoding.\nSoft Attention vs Hard Attention Most large language models use soft attention. Every token receives a weighted mixture from many other tokens, and the weighting is differentiable. That makes training stable with standard gradient descent.\nHard attention uses discrete selection, such as choosing one location or a very small subset of locations. It can be more selective, but it is much harder to train because the discrete choice breaks ordinary backpropagation.\nIn practice:\nsoft attention dominates mainstream LLMs hard attention ideas still appear in routing, retrieval, and some specialized sparse systems Attention for Long Contexts Standard full attention is expensive because the score matrix grows with the square of sequence length. If a sequence has N tokens, the matrix has N x N interactions.\nThat is acceptable for short contexts. It becomes painful for long documents, code repositories, or long conversations.\nThis pressure led to several families of efficient attention.\nLocal or Sliding-Window Attention Local attention restricts each token to a fixed neighborhood, such as a window of nearby tokens.\nWhy it helps:\ncompute and memory drop sharply nearby dependencies in language are often very important it is simple and hardware-friendly Where it struggles:\npurely local windows can miss long-range dependencies information must hop across many layers to travel far Sparse Attention Sparse attention does not let every token attend everywhere. Instead, it computes a carefully chosen subset of token pairs.\nTypical sparse patterns include:\nlocal bands strided links dilated patterns designated global tokens The goal is to keep important long-range routes while avoiding a dense N x N matrix.\nGlobal + Local Attention Architectures such as Longformer combine local windows with a few globally visible tokens. Those global tokens act like hubs, allowing information to travel long distances without fully dense attention.\nThis is often a strong compromise for document processing:\nlocal attention captures nearby structure global tokens provide long-range communication Linear Attention Linear attention changes the computation rather than only changing the pattern. Instead of explicitly forming the full pairwise attention matrix, it rewrites or approximates the operation so the cost grows roughly linearly with sequence length.\nThat can be attractive for very long inputs, but there is a trade-off:\nit is faster or more memory-efficient in the right setting it does not always match the quality of exact full attention implementation details matter a lot Long-context variants either limit which token pairs are scored, as in local and sparse attention, or they change the algebra itself, as in linear attention.\nCross-Attention vs Self-Attention in Real Systems A useful way to think about deployment is this:\nencoder-only models use self-attention heavily for representation learning decoder-only LLMs use causal self-attention for next-token prediction encoder-decoder models use self-attention inside each stack and cross-attention between them multimodal models often use cross-attention to connect text with image, audio, or video features That means \u0026ldquo;attention mechanism\u0026rdquo; can refer either to the scoring rule itself or to the connectivity pattern between information sources.\nThe Inference Problem: KV Cache The transcript you shared focuses on a particularly important decoder-side issue: the key-value cache, usually shortened to KV cache.\nWhen a decoder-only language model generates text one token at a time, it does not want to recompute all keys and values for the full prefix on every step. So it stores previously computed keys and values in memory and reuses them.\nThat is a huge win for compute, but it creates a memory bottleneck:\nthe cache grows with sequence length it must exist for every layer in vanilla multi-head attention, each head contributes its own keys and values As models and contexts get larger, moving KV cache data can become a major cost during inference.\nThis is the background for MQA, GQA, and MLA.\nMQA, GQA, and MLA: Attention Variants for Faster Decoding Multi-Head Attention (MHA) This is the standard transformer setup:\nevery head has its own Q, K, and V projections every head keeps its own key and value cache Strength:\nmaximum head-specific flexibility Trade-off:\nlargest KV cache and highest memory bandwidth pressure Multi-Query Attention (MQA) MQA keeps separate queries per head, but it shares the key and value projections across all heads.\nWhy it helps:\nthe KV cache becomes much smaller decoding can become much faster Trade-off:\nheads lose some freedom because they read from the same shared keys and values Grouped-Query Attention (GQA) GQA is a compromise between MHA and MQA. Instead of one shared K/V for all heads, it shares K/V within groups of heads.\nThis gives:\na smaller KV cache than MHA more specialization than MQA That is why GQA has become a common practical choice in large models.\nMulti-Head Latent Attention (MLA) This is the mechanism highlighted in the Welch Labs video and poster.\nThe key idea is not simply to force heads to share the same keys and values. Instead, the model learns a compressed latent representation for the KV cache, shared across heads, and then uses learned projections so each head can still recover head-specific behavior from that shared latent space.\nConceptually, MLA does this:\ncompress the information that would normally live in a large per-head KV cache into a smaller shared latent cache keep enough structure so different heads can still act differently rearrange the inference computation so the compression does not introduce a large new runtime penalty That is why MLA is interesting. It is not just a memory-saving trick in the crude sense. It tries to preserve the benefits of multi-head specialization while shrinking the cache dramatically.\nThe decoder-efficiency family: MHA stores separate K/V per head, MQA shares one K/V set, GQA shares within groups, and MLA uses a learned shared latent cache with head-specific recovery.\nWhy DeepSeek MLA Drew So Much Attention The Welch Labs video discusses DeepSeek R1 because that model made the mechanism famous to a wider audience in early 2025. The underlying attention design, however, was introduced in the DeepSeek-V2 technical report published in May 2024.\nThat distinction matters:\nthe reasoning model publicized the result the earlier technical report introduced the architectural idea The central engineering claim is that MLA drastically reduces KV-cache pressure relative to standard multi-head attention. That matters because modern decoding is often memory-bandwidth-bound, not purely arithmetic-bound.\nIn other words, the bottleneck is often not \u0026ldquo;how many multiplications can the GPU do?\u0026rdquo; but \u0026ldquo;how quickly can the system move cached keys and values back into the compute units for the next token?\u0026rdquo;\nWhy MLA is appealing:\nsmaller KV cache lower memory traffic during decode preservation of more head-specific flexibility than simple MQA better fit for long-context inference workloads One subtle but important point from the DeepSeek description is that MLA is combined with algebraic rearrangements so the latent-space trick does not simply add another expensive matrix multiply on every token. That design choice is part of what makes the mechanism practically useful instead of merely elegant on paper.\nNot Everything Called \u0026ldquo;Attention Optimization\u0026rdquo; Is a New Attention Mechanism This is an important distinction.\nSome improvements change the attention mechanism itself:\nadditive attention self-attention sparse attention linear attention MQA, GQA, MLA Other improvements mainly change the implementation strategy:\nFlashAttention fused kernels better cache layouts quantized KV caches These are all valuable, but they solve different layers of the problem.\nFor example:\nFlashAttention improves how attention is computed on hardware MLA changes what is stored and how head-specific information is represented The first is mainly an execution optimization. The second is an architectural change.\nWhich Attention Mechanism Should You Use? There is no universally best choice. The right mechanism depends on the job.\nUse full self-attention when: context lengths are moderate you want maximum modeling flexibility you are building a standard transformer baseline Use cross-attention when: one sequence must read from another sequence you are building encoder-decoder or multimodal systems Use local or sparse attention when: long-context cost is a real bottleneck the task has strong locality structure you can tolerate restricted token-to-token connectivity Use linear attention when: you need streaming or long-sequence efficiency approximate or reformulated attention is acceptable you are optimizing for asymptotic scaling Use MQA or GQA when: decoder inference speed matters KV-cache footprint is a bottleneck you need a practical LLM inference optimization Use MLA when: you want more aggressive KV-cache reduction you still want more flexibility than naive K/V sharing you are designing for large-scale autoregressive decoding A Practical Comparison Table Mechanism What changes? Main benefit Main trade-off Common use Additive attention Learned scoring network over encoder/decoder states Strong seq2seq alignment Less hardware-friendly than dot products Early NMT Dot-product attention Similarity via vector products Fast and scalable Still expensive at long context Seq2seq, transformers Self-attention Tokens attend within one sequence Rich contextualization Full version is quadratic Encoders and decoders Cross-attention Queries read a different sequence Great for conditioning and fusion Extra memory and compute Encoder-decoder, multimodal Causal attention Future positions are masked Valid next-token prediction Still quadratic over prefix during training GPT-style LLMs Multi-head attention Multiple learned heads in parallel Diverse relation modeling Large KV cache in decoding Vanilla transformers Local attention Restrict attention to nearby windows Cheaper long-context processing Weak direct long-range access Long documents Sparse attention Compute only selected token pairs Better scaling than full attention Pattern design matters Long-context models Linear attention Rewrite or approximate attention algebra Near-linear scaling May sacrifice exactness Streaming and very long sequences Multi-query attention Share K/V across all heads Much smaller KV cache Less head specialization Fast decoder inference Grouped-query attention Share K/V within groups Good quality/speed compromise Not as flexible as full MHA Many modern LLMs Multi-head latent attention Learn a compressed shared latent KV space Very small KV cache with stronger flexibility More architectural complexity DeepSeek-style decoder efficiency Common Misunderstandings \u0026ldquo;Attention means the model understands language like a human.\u0026rdquo; No. Attention is a learned weighting mechanism. It is powerful, but it is still numerical pattern processing.\n\u0026ldquo;Attention weights are a complete explanation of model reasoning.\u0026rdquo; Not necessarily. Attention maps can be informative, but they are not the whole story. Feed-forward blocks, residual paths, normalization, and head interactions all contribute.\n\u0026ldquo;All efficient attention methods solve the same problem.\u0026rdquo; No. Some solve training-time sequence scaling, some solve long-context connectivity, and some solve decode-time KV-cache bandwidth.\n\u0026ldquo;DeepSeek MLA replaces all earlier attention ideas.\u0026rdquo; No. MLA is best understood as a decoder-efficiency architecture for transformer-style models, not as a universal replacement for every attention variant.\nFinal Takeaway The word attention now covers a family of related ideas rather than one single mechanism.\nThe progression looks like this:\nearly attention solved alignment in sequence-to-sequence models transformer self-attention made global token interaction the center of the architecture long-context variants reduced the cost of pairwise interactions decoder-focused variants such as MQA, GQA, and MLA attacked the KV-cache bottleneck during generation If you are learning modern AI systems, that last step is especially important. Once models become large enough, architecture is no longer only about raw quality. It is also about bandwidth, latency, cache size, and deployability. That is exactly why newer mechanisms such as DeepSeek\u0026rsquo;s MLA matter.\nSources and Further Reading Bahdanau, Cho, and Bengio, Neural Machine Translation by Jointly Learning to Align and Translate (2014): https://arxiv.org/abs/1409.0473 Luong, Pham, and Manning, Effective Approaches to Attention-based Neural Machine Translation (2015): https://arxiv.org/abs/1508.04025 Vaswani et al., Attention Is All You Need (2017): https://arxiv.org/abs/1706.03762 Shazeer, Fast Transformer Decoding: One Write-Head is All You Need (2019): https://arxiv.org/abs/1911.02150 Child et al., Generating Long Sequences with Sparse Transformers (2019): https://arxiv.org/abs/1904.10509 Beltagy, Peters, and Cohan, Longformer: The Long-Document Transformer (2020): https://arxiv.org/abs/2004.05150 Katharopoulos et al., Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (2020): https://arxiv.org/abs/2006.16236 Ainslie et al., GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints (2023): https://arxiv.org/abs/2305.13245 DeepSeek-AI, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (May 2024): https://arxiv.org/abs/2405.04434 Welch Labs poster page, MLA/DeepSeek Attention Poster 13x19: https://www.welchlabs.com/resources/mladeepseek-attention-poster-13x19 ","permalink":"https://learncodecamp.net/attention-mechanisms-complete-guide/","summary":"\u003cp\u003eAttention is the idea that made modern transformers practical and powerful. Instead of compressing an entire input into one fixed vector, a model can decide, token by token, which earlier pieces of information matter most right now.\u003c/p\u003e\n\u003cp\u003eThat sounds simple, but there are many different kinds of attention mechanisms, and they exist because models face different constraints:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003esome need strong alignment between an encoder and a decoder\u003c/li\u003e\n\u003cli\u003esome need to generate text one token at a time without looking ahead\u003c/li\u003e\n\u003cli\u003esome need to handle very long documents\u003c/li\u003e\n\u003cli\u003esome need to reduce GPU memory traffic at inference time\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThis article walks through the main families of attention, shows where they fit, and explains why newer variants such as DeepSeek\u0026rsquo;s \u003cstrong\u003emulti-head latent attention (MLA)\u003c/strong\u003e matter.\u003c/p\u003e","title":"Attention Mechanisms Explained: Self-Attention, Cross-Attention, Sparse Attention, MQA, GQA, and DeepSeek MLA"},{"content":"When people say they want to \u0026ldquo;search in CloudWatch\u0026rdquo;, what they usually need is CloudWatch Logs Insights.\nIt is much more useful than manually opening individual log streams because you can search across log groups, combine conditions, sort by timestamp, and limit results quickly.\nThat said, AWS also has the basic log search interface inside a log group. If you select all streams and search there, the syntax is different. It uses filter patterns, not Logs Insights query language.\nIf your goal is to find log lines:\ncontaining str1 and str2 containing str1 but not str2 then Logs Insights is the right tool.\nOpen CloudWatch Logs Insights In the AWS Console:\nOpen CloudWatch Go to Logs Insights Select the relevant log group or log groups Choose the time range carefully Run a query The time range matters a lot. If you search a huge time window, the query becomes slower and scans more data.\nBasic Query Shape Most searches start with something like this:\nfields @timestamp, @message | sort @timestamp desc | limit 100 This simply shows recent log lines.\nFrom there, add a filter clause.\n1. Search Log Lines Containing str1 and str2 Use:\nfields @timestamp, @message | filter @message like /str1/ and @message like /str2/ | sort @timestamp desc | limit 100 This returns only log lines where both strings appear in the same log event.\nExample:\nfields @timestamp, @message | filter @message like /payment/ and @message like /failed/ | sort @timestamp desc | limit 100 This is useful for cases like:\norder and timeout exception and userId payment and failed 2. Search Log Lines Containing str1 but Not str2 Use:\nfields @timestamp, @message | filter @message like /str1/ and @message not like /str2/ | sort @timestamp desc | limit 100 Example:\nfields @timestamp, @message | filter @message like /error/ and @message not like /timeout/ | sort @timestamp desc | limit 100 This is helpful when one keyword is too broad and you want to remove noisy results.\n3. Search for Either str1 or str2 Sometimes you want either term instead of both:\nfields @timestamp, @message | filter @message like /str1/ or @message like /str2/ | sort @timestamp desc | limit 100 Example:\nfields @timestamp, @message | filter @message like /error/ or @message like /exception/ | sort @timestamp desc | limit 100 4. Search for an Exact Phrase If the thing you want is a phrase, search that phrase directly:\nfields @timestamp, @message | filter @message like /connection refused/ | sort @timestamp desc | limit 100 This is often better than searching individual words separately when the message format is stable.\n5. Case-Insensitive Search If your logs may contain Error, ERROR, or error, use a case-insensitive regex:\nfields @timestamp, @message | filter @message like /(?i)error/ | sort @timestamp desc | limit 100 This avoids missing matches because of letter casing.\n6. Search Structured Fields Instead of Raw Message Text If your logs are structured and fields are available, prefer filtering by fields instead of searching the whole @message.\nExample:\nfields @timestamp, level, service, requestId, @message | filter level = \u0026#34;ERROR\u0026#34; and service = \u0026#34;payments\u0026#34; | sort @timestamp desc | limit 100 This is better than free-text search because:\nit is more precise it is easier to maintain equality-based filters can be more efficient than like 7. Useful Additions While Debugging Search by request ID fields @timestamp, @message, requestId | filter requestId = \u0026#34;abc-123\u0026#34; | sort @timestamp desc | limit 100 Find recent exceptions only fields @timestamp, @message | filter @message like /Exception/ | sort @timestamp desc | limit 50 Exclude health checks or noisy endpoints fields @timestamp, @message | filter @message like /error/ | filter @message not like /health/ | filter @message not like /ready/ | sort @timestamp desc | limit 100 8. Tips to Search CloudWatch Logs More Effectively 1. Always narrow the time range This is the fastest way to reduce noise, query time, and scan volume.\n2. Start with @message, then move to fields If you do not yet know the structure of the logs, begin with:\nfields @timestamp, @message | sort @timestamp desc | limit 20 Then identify useful fields like requestId, status, level, path, or service.\n3. Use limit When debugging, you usually do not need thousands of rows immediately.\n| limit 50 is often enough.\n4. Prefer exact field matches when possible A query like:\n| filter requestId = \u0026#34;abc-123\u0026#34; is usually better than:\n| filter @message like /abc-123/ because exact field filters are more targeted.\n5. Use not like to remove noise This is one of the most useful techniques in production debugging, especially when health checks, retries, or known warning messages flood the results.\nIf You Want to Use the Basic CloudWatch Log Search Interface Sometimes you are already inside a log group and want to use the built-in search box after selecting all streams.\nThat works too, but the syntax is different from Logs Insights.\nIn the AWS Console:\nOpen CloudWatch Go to Log groups Open the relevant log group Select all streams or choose Search log group Enter a filter pattern in the log events search box Basic Interface Query Syntax In the basic interface, unstructured text search uses filter patterns:\nmultiple terms separated by spaces mean AND prefix a term with - to exclude it wrap an exact phrase in double quotes regex uses %...% Do not paste Logs Insights queries like fields, filter, or sort into this box. They will not work there.\nBasic Search: Contains str1 and str2 Use:\nstr1 str2 Example:\npayment failed This returns log events that contain both terms.\nBasic Search: Contains str1 but Not str2 Use:\nstr1 -str2 Example:\nerror -timeout This returns log events that contain error but exclude events that also contain timeout.\nBasic Search: Exact Phrase Use double quotes:\n\u0026#34;connection refused\u0026#34; This is useful when the phrase is stable and you do not want broad partial matches.\nBasic Search: Regex The basic interface also supports regex filter patterns using %:\n%ERROR|Exception% This can be useful for broader matching, but the regex support is more limited than what many developers expect, so keep patterns simple.\nWhen to Use Basic Search vs Logs Insights Use the basic interface when:\nyou are inspecting one log group quickly you already know the time range you want a fast text filter over selected streams Use Logs Insights when:\nyou need richer queries you want to combine conditions clearly you want sorting, fields, parsing, or aggregations you need to search across multiple log groups more effectively Common Mistake A common mistake is to mix the two CloudWatch search modes.\nFor Logs Insights, think in terms of:\nfilter and or not like exact field matches sorting and limiting For the basic search interface, think in terms of:\nspace-separated terms for AND -term for exclusion \u0026quot;exact phrase\u0026quot; for phrase search %regex% for regex patterns That is what makes CloudWatch log searches practical at scale.\nFinal Query Cheat Sheet Contains str1 and str2 fields @timestamp, @message | filter @message like /str1/ and @message like /str2/ | sort @timestamp desc | limit 100 Contains str1 but not str2 fields @timestamp, @message | filter @message like /str1/ and @message not like /str2/ | sort @timestamp desc | limit 100 Basic interface: contains str1 and str2 str1 str2 Basic interface: contains str1 but not str2 str1 -str2 Contains str1 or str2 fields @timestamp, @message | filter @message like /str1/ or @message like /str2/ | sort @timestamp desc | limit 100 Conclusion For most real-world debugging in AWS, the best way to search log lines is to use CloudWatch Logs Insights with filter conditions.\nThe two most useful patterns are:\nlike ... and like ... like ... and not like ... If you stay in the older log-group search interface after selecting all streams, use:\nstr1 str2 str1 -str2 Once you get comfortable with both modes, add or, exact phrases, case-insensitive matching, and structured-field filters to search much faster.\n","permalink":"https://learncodecamp.net/how-to-search-aws-cloudwatch-logs-effectively/","summary":"\u003cp\u003eWhen people say they want to \u0026ldquo;search in CloudWatch\u0026rdquo;, what they usually need is \u003cstrong\u003eCloudWatch Logs Insights\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eIt is much more useful than manually opening individual log streams because you can search across log groups, combine conditions, sort by timestamp, and limit results quickly.\u003c/p\u003e\n\u003cp\u003eThat said, AWS also has the \u003cstrong\u003ebasic log search interface\u003c/strong\u003e inside a log group. If you select all streams and search there, the syntax is different. It uses \u003cstrong\u003efilter patterns\u003c/strong\u003e, not Logs Insights query language.\u003c/p\u003e","title":"How to Search AWS CloudWatch Logs Effectively"},{"content":"If you use a hosted payment page like Stripe Checkout, the safest architecture is:\nCreate an internal order before redirecting. Redirect the user to the hosted checkout URL. Use webhooks as the source of truth for fulfillment. Let the frontend poll order status after return. This avoids race conditions and ensures users still get entitlements even if they close the tab before the success page loads.\nWhy This Pattern Works Hosted checkout redirects are excellent for UX and compliance, but redirects are not guaranteed delivery signals.\nA webhook is the reliable signal. The redirect page is only a user-facing state page.\nEnd-to-End Sequence High-Level Components Frontend App Shows plans or packs. Calls backend to create checkout session. Redirects user to hosted checkout. On return page, checks order status and polls while pending. Backend API Authenticates user. Creates internal order (pending) before payment. Creates hosted checkout session. Verifies webhook signature and fulfills idempotently. Exposes order status endpoint. Database Plans or packs catalog. orders table/collection. processed_webhook_events for idempotency. User entitlement/credits balance. API Shape POST /checkout-session Request:\n{ \u0026#34;planId\u0026#34;: \u0026#34;pro_monthly\u0026#34; } Response:\n{ \u0026#34;orderId\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;checkoutUrl\u0026#34;: \u0026#34;...\u0026#34; } POST /order-status Request:\n{ \u0026#34;orderId\u0026#34;: \u0026#34;...\u0026#34; } or\n{ \u0026#34;sessionId\u0026#34;: \u0026#34;...\u0026#34; } Response:\n{ \u0026#34;orderId\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;status\u0026#34;: \u0026#34;pending|paid|failed|canceled\u0026#34; } POST /payment-webhook Stripe calls this endpoint. The backend verifies signature and fulfills idempotently.\nProduction Checklist Verify webhook signatures with raw request body. Store processed webhook event IDs (idempotency). Create internal order before checkout session creation. Enforce order ownership checks on status API. Keep plan pricing authoritative on backend/DB (never trust client amount). Make fulfillment webhook-driven, not redirect-driven. Add retry-safe entitlement logic (grant once). Final Takeaway Treat the hosted checkout redirect as UX, and the webhook as truth.\nThat single decision makes your billing flow robust, auditable, and far less fragile in real-world conditions.\n","permalink":"https://learncodecamp.net/integrating-stripe-payment-with-checkout-flow/","summary":"\u003cp\u003eIf you use a hosted payment page like Stripe Checkout, the safest architecture is:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eCreate an internal order before redirecting.\u003c/li\u003e\n\u003cli\u003eRedirect the user to the hosted checkout URL.\u003c/li\u003e\n\u003cli\u003eUse webhooks as the source of truth for fulfillment.\u003c/li\u003e\n\u003cli\u003eLet the frontend poll order status after return.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThis avoids race conditions and ensures users still get entitlements even if they close the tab before the success page loads.\u003c/p\u003e\n\u003ch2 id=\"why-this-pattern-works\"\u003eWhy This Pattern Works\u003c/h2\u003e\n\u003cp\u003eHosted checkout redirects are excellent for UX and compliance, but redirects are not guaranteed delivery signals.\u003c/p\u003e","title":"Integrating Stripe Payment with Checkout Flow (Webhooks + Polling)"},{"content":"We recently added a dedicated Developer Tools section to Learn Code Camp and shipped multiple utility tools in one go.\nThe goal was simple:\nClient-side only. Your data stays in your browser on this page.\nThat requirement shaped every implementation choice.\nWhat We Added We added a new /tools section with these live tools:\nJSON Formatter + Validator Base64 Encode/Decode URL Encode/Decode UUID Generator (v4) Unix Timestamp Converter JWT Decoder Regex Tester Text Diff Checker Hash Generator (SHA-256, MD5) Why Client-Side Only? For utility tools, people often paste sensitive payloads: tokens, configs, logs, API responses, and JSON with private fields.\nA server round-trip is unnecessary for most transformations, so we kept all logic in browser JavaScript.\nBenefits:\nBetter privacy posture (no backend data processing) Faster interaction (no API latency) Lower ops complexity (no tool backend to deploy/monitor) Fits static hosting perfectly on Cloudflare Pages Project Structure Changes (Hugo) We introduced a dedicated Hugo section instead of mixing tools with blog posts.\n1) New content section content/tools/_index.md for landing page copy One markdown file per tool under content/tools/ 2) New layouts for tools layouts/tools/list.html for /tools layouts/tools/single.html for each tool page This gave us full control over tool UI without affecting normal article templates.\n3) Shared client-side runtime All tool behavior lives in:\nstatic/js/tools.js Each page declares a tool_id, and the script initializes the correct module by reading data-tool.\n4) Shared tool styling Custom styling is handled in:\nassets/css/extended/tools.css We later widened the layout for all tools to better use desktop screen space and support large payload workflows.\n5) Navigation + content isolation In hugo.toml:\nAdded a top nav item for /tools Set mainSections = [\u0026quot;posts\u0026quot;] so homepage article listing stays post-focused JSON Formatter: Special UX Improvements The JSON tool needed extra work because big payloads are hard to manage in small editors.\nWe upgraded it to a wide workspace and added:\nUpload JSON file Validate JSON Beautify with indentation options (2 spaces, 4 spaces, tab) Minify/compact Convert JSON to CSV/XML/YAML Download output Copy output Better parse error feedback (line/column context) Unix Timestamp Converter: Live Current Epoch We added a live banner that shows the current epoch value and updates every second, plus quick copy.\nThis makes the converter useful even before typing anything.\nCloudflare Pages Fit Because this is a Hugo static site on Cloudflare Pages, the tools deploy as static assets with no backend runtime.\nThat keeps deploys simple and costs predictable while still giving rich interactive behavior.\nImplementation Notes A few practical lessons from this build:\nBuild tools as a separate Hugo section early; template control matters. Keep one shared JS runtime and initialize by tool_id to avoid code duplication. Design for large inputs from day one (especially JSON and diff tools). Make privacy messaging explicit on each tool page. Closing This tools rollout was guided by one non-negotiable principle:\nClient-side only. Your data stays in your browser on this page.\nThat gave us a clean architecture, fast UX, and a safer default for users.\n","permalink":"https://learncodecamp.net/how-we-added-tools-section-client-side-only/","summary":"\u003cp\u003eWe recently added a dedicated \u003cstrong\u003eDeveloper Tools\u003c/strong\u003e section to Learn Code Camp and shipped multiple utility tools in one go.\u003c/p\u003e\n\u003cp\u003eThe goal was simple:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eClient-side only. Your data stays in your browser on this page.\u003c/strong\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThat requirement shaped every implementation choice.\u003c/p\u003e\n\u003ch2 id=\"what-we-added\"\u003eWhat We Added\u003c/h2\u003e\n\u003cp\u003eWe added a new \u003ccode\u003e/tools\u003c/code\u003e section with these live tools:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eJSON Formatter + Validator\u003c/li\u003e\n\u003cli\u003eBase64 Encode/Decode\u003c/li\u003e\n\u003cli\u003eURL Encode/Decode\u003c/li\u003e\n\u003cli\u003eUUID Generator (v4)\u003c/li\u003e\n\u003cli\u003eUnix Timestamp Converter\u003c/li\u003e\n\u003cli\u003eJWT Decoder\u003c/li\u003e\n\u003cli\u003eRegex Tester\u003c/li\u003e\n\u003cli\u003eText Diff Checker\u003c/li\u003e\n\u003cli\u003eHash Generator (SHA-256, MD5)\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"why-client-side-only\"\u003eWhy Client-Side Only?\u003c/h2\u003e\n\u003cp\u003eFor utility tools, people often paste sensitive payloads: tokens, configs, logs, API responses, and JSON with private fields.\u003c/p\u003e","title":"How We Added a Developer Tools Section in Hugo (Client-Side Only)"},{"content":"If you\u0026rsquo;re planning to run open-weight LLMs locally or in production, one of the first questions is:\nHow much GPU VRAM do I actually need?\nThe answer depends on three major components:\nModel weights KV cache (context memory) Runtime overhead Let’s break each one down clearly and practically.\n1️⃣ Model Weights: The Base Memory Cost The largest fixed memory cost comes from the model weights.\nSimple Formula Weights (GB) ≈ Parameters (in billions) × (bits per weight / 8)\nBecause:\n8 bits = 1 byte 1 billion parameters ≈ 1e9 values Typical Memory Per Parameter Precision Bytes per Parameter 7B Model 70B Model FP32 4 bytes 28 GB 280 GB FP16/BF16 2 bytes 14 GB 140 GB INT8 1 byte 7 GB 70 GB 4-bit ~0.5 byte ~3.5–5 GB ~35–50 GB Note: Quantized models use extra scaling factors, so real usage is slightly higher than the theoretical number.\nExample Calculation For a 13B model in FP16:\n13 × (16 / 8) = 26 GB\nSo you need ~26 GB just for weights.\n2️⃣ KV Cache: The Hidden Memory Multiplier The second major memory consumer is the KV cache (Key-Value cache).\nThis stores attention history so the model doesn\u0026rsquo;t recompute previous tokens.\nKV Cache Scales With: Context length (number of tokens) Batch size (concurrent requests) Number of layers Hidden size KV precision Rough Practical Estimates For many modern models:\nModel Size KV per 1k Tokens 7B ~0.2–0.6 GB 13B ~0.4–1.0 GB 70B Several GB So:\n8k context can consume several GB 32k context can consume tens of GB 64k+ context becomes extremely expensive 3️⃣ Runtime Overhead Even after weights + KV cache, you need extra headroom for:\nCUDA workspace buffers FlashAttention Temporary activations Memory fragmentation Sampling buffers Safe Rule: Add 10–30% extra VRAM for stability.\n📌 Total VRAM Formula Total VRAM ≈ Weights + KV Cache + 20% Overhead\nPractical VRAM Requirements by Model Class Model 4-bit FP16 7B 6–10 GB 16–24 GB 13B 12–20 GB 32–48 GB 70B 40–60 GB 160+ GB These assume moderate context (8k–16k).\nLong context multiplies requirements quickly.\n🚀 Example: Qwen3.5-397B-A17B Now let’s apply this to a real model.\nThe model described:\n397 billion total parameters 17B activated per token (Mixture-of-Experts) 262k native context (extendable past 1M) 60 layers Apache 2.0 license Verified on 8× H200 GPUs ⚠️ Important: \u0026ldquo;17B Active\u0026rdquo; Does NOT Reduce Weight Memory Even though only 17B parameters activate per token,\nyou still must load all 397B parameters into memory.\nMoE reduces compute cost — not weight storage cost.\n1️⃣ Weight Memory BF16 / FP16 397 × (16 / 8) ≈ 794 GB\nSo roughly:\n~800 GB VRAM for weights\nFP8 / INT8 397 × (8 / 8) ≈ 397 GB\n~400 GB VRAM\n4-bit Quantization 397 × (4 / 8) ≈ 198.5 GB\nRealistically:\n~220–260 GB VRAM including scaling overhead\n2️⃣ KV Cache at 64k Context This model uses:\n60 layers 2 KV heads Head dimension 256 Approximate KV usage:\n~120 KB per token At 64k tokens: ≈ 7.5 GB per request\nSo:\n10 concurrent 64k sessions ≈ 75 GB 30 concurrent sessions ≈ 225 GB KV memory scales linearly with concurrency.\n3️⃣ Realistic Deployment: 8× H200 141GB Total VRAM:\n8 × 141 GB = 1128 GB\nIn BF16: ~800 GB weights ~100–200 GB KV cache ~100 GB overhead This fits safely.\nSpeed=(tokensprompt​+tokensgeneration​​)/ time\nQwen’s official “Speed Benchmark\n📊 Expected Throughput on 8× H200 For 64k max context:\nWorkload Type Aggregate Output Throughput Online serving ~200–800 tokens/sec High-batch offline ~600–1500 tokens/sec Per single request:\nUsually single-digit to few dozen tokens/sec Depends on batching efficiency Throughput depends heavily on:\nContinuous batching KV cache pressure Network latency All-reduce efficiency Thinking mode usage 🧠 Key Takeaways Parameter count determines weight memory. Context length determines KV memory. Concurrency multiplies KV usage. MoE reduces compute cost, not memory cost. Large 400B-class models require multi-GPU clusters. Long context (64k+) dramatically increases memory needs. 🎯 Final Rule of Thumb If you\u0026rsquo;re sizing hardware:\nVRAM ≈ (Params × bits/8) + (Context × KV per token × concurrency) + 20%\nFor Qwen3.5-397B-A17B:\nBF16 → ~800 GB baseline Practical deployment → 8× H200 class system 64k context → ~7.5 GB per active request ","permalink":"https://learncodecamp.net/gpu-vram-requirement-llms/","summary":"\u003cp\u003eIf you\u0026rsquo;re planning to run open-weight LLMs locally or in production, one of the first questions is:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eHow much GPU VRAM do I actually need?\u003c/strong\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe answer depends on three major components:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eModel weights\u003c/strong\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eKV cache (context memory)\u003c/strong\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRuntime overhead\u003c/strong\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eLet’s break each one down clearly and practically.\u003c/p\u003e\n\u003chr\u003e\n\u003ch1 id=\"1-model-weights-the-base-memory-cost\"\u003e1️⃣ Model Weights: The Base Memory Cost\u003c/h1\u003e\n\u003cp\u003eThe largest fixed memory cost comes from the model weights.\u003c/p\u003e\n\u003ch3 id=\"simple-formula\"\u003eSimple Formula\u003c/h3\u003e\n\u003cp\u003eWeights (GB) ≈ Parameters (in billions) × (bits per weight / 8)\u003c/p\u003e","title":"How Much GPU VRAM Do You Need to Run Large Language Models?"},{"content":"Why Migrate from WordPress? WordPress is powerful, but for a technical blog that mostly serves static content, it comes with unnecessary overhead — hosting costs, plugin updates, security patches, and slower page loads. Static site generators like Hugo offer a simpler, faster, and cheaper alternative.\nHere\u0026rsquo;s what we migrated to:\nHugo — blazing fast static site generator PaperMod — clean, minimal theme perfect for tech blogs Decap CMS — web-based content management with GitHub backend Cloudflare Pages — free hosting with global CDN Google AdSense — preserved auto ads from the WordPress site The result? A site that builds in under 1 second, costs $0/month to host, and is served from Cloudflare\u0026rsquo;s global edge network.\nStep 1: Export WordPress Content We used the WordPress to Hugo Exporter plugin to export all posts and pages as Markdown files with YAML front matter. The export gave us:\n72 blog posts as .md files Static pages (About, Contact, Privacy Policy) A config.yaml with site metadata Step 2: Set Up Hugo with PaperMod hugo new site learncodecamp cd learncodecamp git init git submodule add https://github.com/adityatelange/hugo-PaperMod.git themes/PaperMod The key configuration in hugo.toml:\nbaseURL = \u0026#34;https://learncodecamp.net\u0026#34; title = \u0026#34;Learn Code Camp\u0026#34; theme = \u0026#34;PaperMod\u0026#34; [params] ShowReadingTime = true ShowCodeCopyButtons = true ShowToc = true [markup.goldmark.renderer] unsafe = true # Required for HTML content from WordPress [permalinks] posts = \u0026#34;/:slug/\u0026#34; # Match old WordPress URL structure The permalinks setting is critical — it ensures all existing URLs continue to work, preserving SEO rankings.\nStep 3: Clean Up WordPress Content The exported Markdown files had a lot of WordPress-specific artifacts:\nrank_math_* and zakra_* metadata in front matter WordPress CSS classes like {.wp-block-heading} \u0026lt;nav class=\u0026quot;wp-block-table-of-contents\u0026quot;\u0026gt; blocks HTML entities like \u0026amp;#8217; instead of ' We wrote a Python cleanup script that:\nStripped front matter — kept only title, author, date, slug, draft, categories, and tags Derived slugs from the old url field to maintain URL compatibility Removed WordPress classes and table-of-contents blocks (PaperMod has its own TOC) Decoded HTML entities back to readable characters Fixed invalid dates on draft posts (6 posts had -001-11-30 as their date) Step 4: Google AdSense Integration Since the site uses AdSense auto ads (placement controlled from the AdSense console), the integration was simple — just one script tag in the \u0026lt;head\u0026gt;:\n\u0026lt;!-- layouts/partials/google-ads-head.html --\u0026gt; \u0026lt;script async src=\u0026#34;https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-XXXXXXXXX\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; PaperMod provides an extend_head.html hook that made this easy:\n\u0026lt;!-- layouts/partials/extend_head.html --\u0026gt; {{ partial \u0026#34;google-ads-head.html\u0026#34; . }} We also added ads.txt and app-ads.txt files in the static/ directory for AdSense verification.\nStep 5: Set Up Decap CMS Decap CMS provides a web-based admin panel at /admin/ that commits directly to your GitHub repository.\nTwo files in static/admin/:\nindex.html — loads the CMS:\n\u0026lt;!doctype html\u0026gt; \u0026lt;html\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34; /\u0026gt; \u0026lt;title\u0026gt;Content Manager\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;div id=\u0026#34;nc-root\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;script src=\u0026#34;https://unpkg.com/decap-cms@^3.0.0/dist/decap-cms.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; Important: The script must be at the end of \u0026lt;body\u0026gt;, not in \u0026lt;head\u0026gt;. Decap CMS 3.x tries to mount into the DOM immediately, and placing the script in \u0026lt;head\u0026gt; causes a Cannot read properties of null (reading 'appendChild') error.\nconfig.yml — defines the content model:\nbackend: name: github repo: nkalra0123/learncodecamp branch: main base_url: https://github-oauth-proxy.nkalra0123.workers.dev collections: - name: \u0026#34;posts\u0026#34; label: \u0026#34;Posts\u0026#34; folder: \u0026#34;content/posts\u0026#34; create: true fields: - { label: \u0026#34;Title\u0026#34;, name: \u0026#34;title\u0026#34;, widget: \u0026#34;string\u0026#34; } - { label: \u0026#34;Date\u0026#34;, name: \u0026#34;date\u0026#34;, widget: \u0026#34;datetime\u0026#34; } - { label: \u0026#34;Author\u0026#34;, name: \u0026#34;author\u0026#34;, widget: \u0026#34;string\u0026#34;, default: \u0026#34;Nitin\u0026#34; } - { label: \u0026#34;Categories\u0026#34;, name: \u0026#34;categories\u0026#34;, widget: \u0026#34;list\u0026#34; } - { label: \u0026#34;Tags\u0026#34;, name: \u0026#34;tags\u0026#34;, widget: \u0026#34;list\u0026#34; } - { label: \u0026#34;Body\u0026#34;, name: \u0026#34;body\u0026#34;, widget: \u0026#34;markdown\u0026#34; } Step 6: OAuth Proxy for Decap CMS Decap CMS needs OAuth to authenticate with GitHub. On Cloudflare Pages (unlike Netlify), there\u0026rsquo;s no built-in OAuth provider, so we deployed a Cloudflare Worker as an OAuth proxy.\nThe flow:\nUser clicks \u0026ldquo;Login with GitHub\u0026rdquo; in the CMS CMS opens a popup to the worker\u0026rsquo;s /auth endpoint Worker redirects to GitHub OAuth GitHub redirects back to the worker\u0026rsquo;s /callback with an auth code Worker exchanges the code for an access token Worker sends the token back to the CMS via postMessage Key gotcha: Decap CMS uses a handshake protocol — the callback page must first signal authorizing:github to the opener, then wait for a message before sending the token. Without this handshake, the CMS won\u0026rsquo;t receive the token.\nStep 7: Deploy on Cloudflare Pages Connected the GitHub repo to Cloudflare Pages Set build command to hugo --minify and output directory to public Added HUGO_VERSION = 0.155.1 as an environment variable Added learncodecamp.net as a custom domain Purged Cloudflare cache to clear old WordPress responses URL Verification One of the most important aspects of the migration was ensuring all 66 published WordPress URLs matched exactly in Hugo. We verified every single URL — zero mismatches. This means:\nNo broken links from search engines or external sites No drop in SEO rankings No need for redirect rules The Result Metric WordPress Hugo Build time N/A \u0026lt; 1 second Hosting cost ~$5-10/month Free Page load 2-4 seconds \u0026lt; 1 second Deployment Manual/FTP Auto on git push Content editing WordPress admin Decap CMS or Git Security patches Frequent None needed New Post Workflow Adding a new post is now:\nGo to https://learncodecamp.net/admin/ Login with GitHub Write the post in the Markdown editor Set title, categories, tags Click Publish Decap CMS commits to GitHub → Cloudflare Pages auto-builds → live in ~1 minute Or just push a new .md file to the content/posts/ directory in the repo — whatever you prefer.\nTools Used Hugo — static site generator PaperMod — Hugo theme Decap CMS — Git-based content management Cloudflare Pages — free static site hosting Cloudflare Workers — OAuth proxy for Decap CMS ","permalink":"https://learncodecamp.net/migrating-wordpress-to-hugo-cloudflare-pages/","summary":"\u003ch2 id=\"why-migrate-from-wordpress\"\u003eWhy Migrate from WordPress?\u003c/h2\u003e\n\u003cp\u003eWordPress is powerful, but for a technical blog that mostly serves static content, it comes with unnecessary overhead — hosting costs, plugin updates, security patches, and slower page loads. Static site generators like Hugo offer a simpler, faster, and cheaper alternative.\u003c/p\u003e\n\u003cp\u003eHere\u0026rsquo;s what we migrated to:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eHugo\u003c/strong\u003e — blazing fast static site generator\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003ePaperMod\u003c/strong\u003e — clean, minimal theme perfect for tech blogs\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eDecap CMS\u003c/strong\u003e — web-based content management with GitHub backend\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCloudflare Pages\u003c/strong\u003e — free hosting with global CDN\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eGoogle AdSense\u003c/strong\u003e — preserved auto ads from the WordPress site\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe result? A site that builds in under 1 second, costs $0/month to host, and is served from Cloudflare\u0026rsquo;s global edge network.\u003c/p\u003e","title":"How to Migrate WordPress to Hugo with Decap CMS and Cloudflare Pages (Free Hosting)"},{"content":"Frontier vision models have gotten really good at understanding images — but they’ve also had a consistent weakness:\nThey still often treat an image like a single static glance.\nSo if the answer depends on something tiny (a serial number, a distant street sign, a gauge reading, a small UI label), the model might miss it… and then it has to guess.\nGoogle’s new capability called Agentic Vision, launched with Gemini 3 Flash, is a major step toward fixing that.\nInstead of “one-shot vision,” it turns image understanding into an agentic loop, where the model can zoom, crop, rotate, annotate, and compute using code execution — and then use the results to answer with visual proof.\nLet’s break it down properly, with use cases, demos, and code.\nWhat is Agentic Vision? Agentic Vision is a new capability in Gemini 3 Flash that combines:\n✅ Visual reasoning (understanding images)\n✅ Tool use via code execution (Python)\n✅ A repeated reasoning loop (Think → Act → Observe)\n✅ Output grounded in visual evidence, not guesswork\nIn simple terms:\nAgentic Vision turns an image task into an investigation — not a prediction. This matters because most vision tasks don’t fail because the model is “dumb” — they fail because the model didn’t look closely enough.\nThe core idea: Think → Act → Observe Agentic Vision introduces an agent-like loop directly into the vision pipeline:\n1) Think The model analyzes:\nyour question the image what information might be missing what steps are required to extract it Example thought:\n“I need to count pedals. I should zoom into the lower portion of the image and isolate the pedal board.” 2) Act The model generates and executes Python code to manipulate the image.\nThis could include:\ncropping a region zooming in rotating an image drawing annotations counting objects reading tables and performing math plotting results with Matplotlib 3) Observe The transformed image (cropped/zoomed/annotated output) is appended back into the model’s context.\nNow the model can answer based on a clearer view of the evidence.\nThis loop can repeat multiple times until the model is confident.\nWhy this is a big deal (and not just hype) Google claims that enabling code execution with Gemini 3 Flash brings a consistent ~5–10% quality boost across vision benchmarks.\nThat’s not a small improvement — especially for visual reasoning tasks that require:\nprecision multi-step verification factual grounding For real production apps, that can be the difference between:\n✅ “works reliably”\nand ❌ “sometimes randomly wrong” What can Agentic Vision do? Real-world behaviors 1) Zooming and inspecting fine details This is the most immediately useful capability.\nInstead of guessing from a blurry part of an image, the model actively zooms in.\nExample use cases:\nreading a gauge value identifying a serial number inspecting small text in UI screenshots checking part numbers on chips/components document screenshots A strong real-world example mentioned:\nPlanCheckSolver.com, an AI building plan validation platform, reportedly improved accuracy by ~5% by enabling code execution to iteratively crop and inspect high-resolution building plan sections like roof edges and structural regions.\n2) Image annotation (a “visual scratchpad”) This is underrated but extremely powerful.\nInstead of just replying:\n“Looks like five fingers.” Agentic Vision can:\ndetect each finger draw boxes or labels produce a visual proof This prevents classic vision hallucinations.\nA famous demo example is counting fingers on an emoji-style hand where many models default to “5 fingers” by assumption — but the image actually shows 6.\nAgentic Vision solves that by marking each finger explicitly.\n3) Visual math + plotting (tables and charts) This is where “LLM guessing” fails hard.\nMany models struggle with:\nreading dense tables from images doing multi-step arithmetic correctly comparing multiple values reliably Agentic Vision helps by:\nextracting data visually offloading math to deterministic Python generating plots (e.g., Matplotlib) for verification Instead of probabilistic arithmetic, you get a verifiable computation pipeline.\nHow to access Agentic Vision (Google AI Studio + API) Agentic Vision is available via:\n✅ Gemini API in Google AI Studio\n✅ Vertex AI\n✅ Rolling out into the Gemini app (via model dropdown “Thinking”)\nTo trigger Agentic Vision behavior, you typically enable:\nTools → Code Execution\nThat unlocks the “Act” part of the loop.\nPython code walkthrough: enabling Agentic Vision Below is the same style of code Google shared (and it’s simple enough to ship in a prototype immediately):\nfrom google import genai from google.genai import types client = genai.Client() image = types.Part.from_uri( file_uri=\"https://goo.gle/instrument-img\", mime_type=\"image/jpeg\", ) response = client.models.generate_content( model=\"gemini-3-flash-preview\", contents=[image, \"Zoom into the expression pedals and tell me how many pedals are there?\"], config=types.GenerateContentConfig( tools=[types.Tool(code_execution=types.ToolCodeExecution)] ), ) print(response.text) What matters in this snippet? The most important piece is here:\ntools=[types.Tool(code_execution=types.ToolCodeExecution)] That is what gives Gemini permission to:\ngenerate Python run it transform the image feed the output back into its reasoning loop Without code execution, the model still has vision — but it loses the “agentic” behavior that makes it reliable for fine-grained tasks.\nWhere Agentic Vision is headed next Google also hinted at what’s coming:\n✅ More implicit behaviors Right now, Gemini 3 Flash is already good at deciding when to zoom.\nBut other actions (like rotation, visual math, heavy transformations) may still require the user to nudge it via prompt.\nThe goal is for these actions to become fully automatic and implicit.\n✅ More tools (web + reverse image search) This is the natural next step:\nVision + Code execution is strong…\nBut vision + code + web grounding is even stronger.\nThat would allow:\nidentifying objects/products verifying real-world information cross-checking visual claims reducing hallucinations even more ✅ Expansion to more model sizes Currently, it’s focused on Flash.\nGoogle has said they plan to expand beyond that.\nFor official blog check this : https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/\n","permalink":"https://learncodecamp.net/agentic-vision-in-gemini-3-flash-turning-seeing-into-an-active-investigation/","summary":"\u003cp\u003eFrontier vision models have gotten \u003cem\u003ereally\u003c/em\u003e good at understanding images — but they’ve also had a consistent weakness:\u003c/p\u003e\n\u003cp\u003eThey still often treat an image like a \u003cstrong\u003esingle static glance\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eSo if the answer depends on something \u003cem\u003etiny\u003c/em\u003e (a serial number, a distant street sign, a gauge reading, a small UI label), the model might miss it… and then it has to \u003cstrong\u003eguess\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eGoogle’s new capability called \u003cstrong\u003eAgentic Vision\u003c/strong\u003e, launched with \u003cstrong\u003eGemini 3 Flash\u003c/strong\u003e, is a major step toward fixing that.\u003c/p\u003e","title":"Agentic Vision in Gemini 3 Flash: Turning “Seeing” into an Active Investigation"},{"content":"Large language models (LLMs) like GPT-4, Llama, or Grok generate text by running inference — the phase where a trained model produces outputs from a given input prompt. While training is resource-intensive and done once, inference happens every time a user sends a query. Understanding the mechanics of inference is key to grasping why some models feel “fast” while others lag, and why certain optimizations matter.\nAt a high level, modern LLM inference (for autoregressive transformer-based models) splits into two distinct phases: prefill and decode. These phases behave very differently in terms of computation and directly affect two critical user-facing metrics: Time to First Token (TTFT) and Inter-Token Latency (ITL).\nThe Two Phases of Inference 1. Prefill (Prompt Processing) When you send a prompt to an LLM, the model first processes the entire input sequence all at once. This is called the prefill phase.\nThe transformer processes every token in the prompt in parallel. For each layer, it computes attention over the full prompt and generates the key-value (KV) cache — a stored representation of the prompt’s keys and values that will be reused later. No new tokens are generated yet; the model is just building the internal state needed for generation. 2. Decode (Token Generation) After prefill, the model starts generating output tokens one by one. This is the decode phase, and it is inherently sequential (autoregressive).\nThe model predicts the next token. That token is appended to the sequence. The new key and value for this token are computed and added to the KV cache. Attention is recomputed using the cached KV states from all previous tokens (prompt + generated so far) plus the newly computed ones. Repeat until the desired length or stop condition. Because each new token depends on all previous ones, decoding cannot be parallelized across output tokens — you must generate them sequentially. This makes decode much slower per token than prefill.\nThe KV cache is crucial here: without it, every decode step would require re-computing attention over the entire growing sequence from scratch (O(n²) cost). With the cache, each decode step only needs to compute new keys/values and attend to the cached past, keeping cost roughly O(n) total across the whole generation.\nKey Performance Metrics Time to First Token (TTFT) TTFT measures the delay from when the request is submitted until the very first output token appears in the response stream.\nTTFT ≈ prefill time + time to compute the first decode step. Long prompts → higher TTFT (because prefill processes everything upfront). TTFT is critical for perceived responsiveness. Users notice if the model “hangs” before starting to reply. Inter-Token Latency (ITL) ITL (sometimes called Time Per Output Token or TPOT) is the average time between consecutive output tokens once generation has started.\nITL is dominated by the decode phase. Lower ITL means the text streams in faster and feels snappier. Typical values on high-end hardware for large models range from 20–100 ms per token, translating to 10–50 tokens per second. Together, TTFT and ITL define the end-user experience:\nLow TTFT → quick start. Low ITL → fast streaming completion. Why the Split Matters in Practice The prefill/decode split creates an asymmetry:\nShort prompts + long outputs → TTFT is low, but total time is dominated by slow sequential decoding. Long prompts + short outputs → TTFT can be high (prefill is expensive), but once it starts, completion is quick. This is why techniques like:\nBatching multiple user requests together, KV cache quantization, Speculative decoding (guess several tokens ahead and verify), Paged attention or vLLM-style memory management, are so important — they primarily accelerate the decode phase or make prefill more efficient under load.\nPrefill is compute-bound (FLOPs-bound). The dominant operations are large matrix multiplications (Q @ Kᵀ and attention @ V) over the full prompt length. These require a massive number of arithmetic operations—far more computation than data movement. Modern GPUs excel in this regime because they are designed for high-throughput floating-point operations when given large, parallel workloads. The phase is “embarrassingly parallel,” so the hardware can operate near peak FLOPs utilization.\nDecode, in contrast, is memory-bound. Each decode step generates only one token, so the matrix multiplications are tiny (shape [1 × d] against the cached KV of length n). There is relatively little arithmetic to do, but the model must load the entire KV cache (which grows with every token) from GPU memory into the compute units. The bottleneck shifts from computation to memory bandwidth: moving large amounts of data with minimal work per byte.\nThis compute-vs-memory distinction has major implications for real-world inference systems, especially when serving multiple users concurrently.\nInference engines like vLLM exploit this insight in their scheduling:\nThey maintain two queues: a waiting queue (new requests needing prefill) and a running queue (ongoing generations in decode). The scheduler prioritizes the running (decode) queue over the waiting (prefill) queue. This prevents long, compute-heavy prefills from starving latency-sensitive decode steps, which would otherwise cause visible stuttering in streaming responses. Chunked prefill (also called iterative or incremental prefill) further improves fairness: instead of processing an entire long prompt in one massive batch slot, vLLM breaks it into smaller chunks. This caps the amount of compute a single prefill can consume in one iteration, allowing decode steps from other requests to interleave and progress without long delays. These techniques help maintain low ITL and predictable latency even under mixed workloads with varying prompt lengths. Understanding the bound differences explains why simply throwing more GPUs at inference doesn’t always yield proportional speedups—decode-heavy workloads are often limited by memory bandwidth rather than raw compute power.\nSummary LLM inference is not a single uniform process. The prefill phase parallel-processes the prompt to build the KV cache, while the decode phase sequentially generates tokens using that cache. TTFT captures how long you wait for the response to start (mostly prefill), and ITL measures how fast the rest streams in (pure decode).\nUnderstanding these basics explains many real-world behaviors: why long context hurts responsiveness, why streaming feels better than waiting for full output, and why inference optimization is an active and crucial research area.\nNext time you notice a model “thinking” before replying, or text appearing word-by-word at different speeds, you’ll know exactly which phase is at work.\nReferences:\nhttps://docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html\nhttps://sankalp.bearblog.dev/how-prompt-caching-works/#llm-inference-basics\n","permalink":"https://learncodecamp.net/llm-inference-basics-prefill-decode-ttft-itl/","summary":"\u003cp\u003eLarge language models (LLMs) like GPT-4, Llama, or Grok generate text by running \u003cstrong\u003einference\u003c/strong\u003e — the phase where a trained model produces outputs from a given input prompt. While training is resource-intensive and done once, inference happens every time a user sends a query. Understanding the mechanics of inference is key to grasping why some models feel “fast” while others lag, and why certain optimizations matter.\u003c/p\u003e\n\u003cp\u003eAt a high level, modern LLM inference (for autoregressive transformer-based models) splits into two distinct phases: \u003cstrong\u003eprefill\u003c/strong\u003e and \u003cstrong\u003edecode\u003c/strong\u003e. These phases behave very differently in terms of computation and directly affect two critical user-facing metrics: \u003cstrong\u003eTime to First Token (TTFT)\u003c/strong\u003e and \u003cstrong\u003eInter-Token Latency (ITL)\u003c/strong\u003e.\u003cfigure\u003e\u003c/p\u003e","title":"Understanding LLM Inference Basics: Prefill and Decode, TTFT, and ITL"},{"content":"Recently, someone shared a screenshot on x.com, how to download OpenAI Home Directories. I tried it, and it works. In this blog, we will now try to understand exactly what the contents of this home directory are.\nworking with GPT-5.2 thinking with gpt 5.2, i got error zip file not found. https://t.co/c1zTfBlWb9 pic.twitter.com/85tEv28MuJ — Nitin Kalra (@nkalra0123) \u0026lt;a href=\u0026quot;https://twitter.com/nkalra0123/status/1999771366397231386?ref_src=twsrc%5Etfw\u0026quot;\u0026gt;December 13, 2025\u0026lt;/a\u0026gt; Let’s analyse the contents\nInside the open ai home directory oai/ Folder: Slides, Docs, PDFs, and Spreadsheets Tooling This folder is a small toolkit for working with common “office” artifacts – PowerPoint decks, DOCX files, PDFs, and spreadsheets. It combines a few Python utilities with a set of practical guides that describe the preferred tools and a quality-check workflow (render → visually inspect → iterate).\n- Inspect every exported PNG before continuing work. If anything looks off, fix the source and re-run the render → inspect loop until the pages are clean. Quick Map of What’s Here oai/redirect.html: a minimal redirect page with a strict Content Security Policy (CSP). oai/share/slides/: scripts to render slides to images and build montages. oai/skills/: “how to” guidance for DOCX, PDF, and spreadsheet workflows. 1) redirect.html: A Safe, Minimal Redirect Page oai/redirect.html is a tiny HTML page designed to redirect the browser to a URL passed via the target query parameter. It uses a strict CSP and a single inline script whose hash is pinned in the policy. That keeps the page intentionally minimal and reduces the chance of loading unexpected resources.\nWhat it does:\nReads ?target=... from the URL. If present, redirects using location.replace(...) (so it doesn’t keep the original page in history). Includes a sentinel \u0026lt;title\u0026gt; value that downstream code can detect, so it shouldn’t be edited. 2) share/slides: Rendering Decks and Creating Image Montages 2.1 render_slides.py: PowerPoint → PDF → PNG The goal of oai/share/slides/render_slides.py is simple: produce one PNG per slide. Under the hood it uses a reliable two-step pipeline:\nConvert PowerPoint to PDF using LibreOffice (soffice). Rasterize the PDF to images using pdf2image (which relies on Poppler). Tools used:\nLibreOffice CLI (soffice): headless conversion to PDF (and a fallback path via ODP for tricky decks). pdf2image + Poppler: rasterization from PDF pages to PNGs. OOXML parsing: reads ppt/presentation.xml to compute slide dimensions (for DPI selection). How DPI is chosen:\nIf the input is a modern PowerPoint format (like .pptx), the script reads slide size in EMUs from OOXML. Otherwise it converts to PDF first and infers the page size from PDF metadata. It picks a DPI that aims to keep rendered images around the requested max width/height. Output: images are normalized and renamed to a clean slide-N.png format.\n2.2 ensure_raster_image.py: “Make This Image a PNG” Slide extraction workflows often produce mixed image formats (SVGs, metafiles, HEIC, etc.). The script oai/share/slides/ensure_raster_image.py standardizes them by converting “convertible” formats into PNGs.\nExternal tools it can use:\nInkscape: rasterizes .svg, .svgz, .emf, .wmf (and compressed .emz/.wmz after decompression). Ghostscript: rasterizes the first page of .pdf, .eps, .ps to PNG. ImageMagick: format bridging (and used after decoding JPEG XR to TIFF). libheif CLI (heif-convert): converts .heic/.heif to PNG. JPEG XR tools (JxrDecApp): decodes .wdp/.jxr. Why this matters: once everything is PNG (or already a supported raster format), downstream tools like montage creation are straightforward and predictable.\n2.3 create_montage.py: A Slide Sorter–Style Grid oai/share/slides/create_montage.py creates a single “contact sheet” image by tiling many images into a grid. It’s useful for quick reviews (e.g., scanning a whole deck at once).\nTools used:\nPillow (PIL): image loading, resizing, compositing, drawing borders and labels. ensure_raster_image: optional conversions so inputs become raster images. Key behaviors:\nFixed number of columns; rows computed automatically. Each image is scaled to fit within a cell while preserving aspect ratio. Optional labels: slide number, filename, or none. Configurable tolerance: fail fast on bad images, or insert visible placeholders and continue. 3) skills: Practical Guides for DOCX, PDF, and Spreadsheets The oai/skills/ tree contains guidance documents that describe a recommended “author → render → inspect” loop. The emphasis is on visual correctness: tables aligned, fonts consistent, no clipped/overlapping elements, and outputs that look client-ready.\n3.1 DOCX Skills (skills/docs) The DOCX guide recommends:\nCreate/edit with python-docx for structure, styles, lists, and tables. Render for review by converting DOCX → PDF with LibreOffice (soffice) using an isolated user profile to avoid timeouts/locks. Visually inspect by converting PDF pages to PNGs with pdftoppm. 3.2 PDF Skills (skills/pdfs) The PDF guide focuses on:\nCreate with reportlab (programmatic PDF generation). Inspect visually with pdftoppm (PDF → PNG). Optional text extraction with pdfplumber as a complement (not a replacement) to visual review. 3.3 Spreadsheet Skills (skills/spreadsheets) The spreadsheet guidance is a mix of documentation and runnable examples. It recommends:\nPrimary workflow: use artifact_tool (in this environment) for editing, recalculating formulas, and rendering sheets for QA. Alternative workflow: use openpyxl when needed (especially for user-facing code portability). Always verify: recalc formulas and visually render sheets before handing off. The examples/ and examples/features/ scripts demonstrate common tasks: creating a workbook, applying styling, reading an existing XLSX, and adding features like charts, tables, conditional formatting, borders/fills, alignment, wrapping, merging cells, and number formats.\nRecommended “Quality Loop” Across All Artifact Types Across slides, documents, and spreadsheets, the same philosophy shows up repeatedly:\nRender to a visual format (PDF or PNG) as early as possible. Inspect visually (don’t rely only on text extraction). Iterate until clean: fix layout issues, formatting defects, and readability problems. ","permalink":"https://learncodecamp.net/analysis-of-open-ai-home-directory/","summary":"\u003cp\u003eRecently, someone shared a screenshot on \u003ca href=\"http://x.com\"\u003ex.com\u003c/a\u003e, how to download OpenAI Home Directories. I tried it, and it works. In this blog, we will now try to understand exactly what the contents of this home directory are.\u003cfigure\u003e\u003c/p\u003e\n\u003cdiv\u003e\n  \u003cblockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\"\u003e\n    \u003cp lang=\"en\" dir=\"ltr\"\u003e\n      working with GPT-5.2 thinking \u003cbr /\u003e\u003cbr /\u003ewith gpt 5.2, i got error zip file not found. \u003ca href=\"https://t.co/c1zTfBlWb9\"\u003ehttps://t.co/c1zTfBlWb9\u003c/a\u003e \u003ca href=\"https://t.co/85tEv28MuJ\"\u003epic.twitter.com/85tEv28MuJ\u003c/a\u003e\n    \u003c/p\u003e— Nitin Kalra (@nkalra0123) \n\u003cpre\u003e\u003ccode\u003e\u0026lt;a href=\u0026quot;https://twitter.com/nkalra0123/status/1999771366397231386?ref_src=twsrc%5Etfw\u0026quot;\u0026gt;December 13, 2025\u0026lt;/a\u0026gt;\n\u003c/code\u003e\u003c/pre\u003e\n  \u003c/blockquote\u003e\n\u003c/div\u003e\u003c/figure\u003e \n\u003cp\u003eLet’s analyse the contents\u003cfigure\u003e\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"708\" src=\"/wp-content/uploads/2025/12/image-1024x708.png\" alt=\"\" /\u003e \u003c/figure\u003e\u003c/p\u003e\n\u003ch1 id=\"inside-the-open-ai-home-directory\"\u003eInside the open ai home directory\u003c/h1\u003e\n\u003ch3 id=\"oai-folder-slides-docs-pdfs-and-spreadsheets-tooling\"\u003eoai/ Folder: Slides, Docs, PDFs, and Spreadsheets Tooling\u003c/h3\u003e\n\u003cp\u003eThis folder is a small toolkit for working with common “office” artifacts – PowerPoint decks, DOCX files, PDFs, and spreadsheets. It combines a few Python utilities with a set of practical guides that describe the preferred tools and a quality-check workflow (render → visually inspect → iterate).\u003c/p\u003e","title":"Analysis of open ai home directory"},{"content":"Introduction As developers, we often find ourselves juggling multiple projects, each requiring different versions of Python or Java. Maybe you’re maintaining a legacy application that runs on Python 3.8 while building a new microservice on Python 3.12. Or perhaps you’re working with Java 11 for one client and Java 21 for another. Manually managing these versions can quickly become a nightmare of PATH variables, symlinks, and “it works on my machine” debugging sessions.\nThis is where version managers come to the rescue. In this guide, we’ll explore two powerful tools that will transform how you handle language versions: pyenv for Python and SDKMAN for Java.\nWhy Version Managers Matter Before diving into the tools, let’s understand why version managers are essential:\nIsolation: Each project can use its specific language version without conflicts.\nEasy switching: Change versions with a single command instead of modifying system configurations.\nReproducibility: Your team can easily replicate the exact development environment.\nSafety: Experiment with new versions without risking your system’s stability.\nConvenience: Install and manage multiple versions from a single interface.\nPart 1: Managing Python Versions with pyenv pyenv is a simple, powerful tool that lets you easily switch between multiple versions of Python. Unlike system-level installations, pyenv builds Python versions in your home directory, giving you complete control without requiring root access.\nInstalling pyenv The installation process varies by operating system, but the most reliable method is using the official installer script.\nOn macOS and Linux:\ncurl https://pyenv.run | bash After installation, add pyenv to your shell configuration. For bash, add these lines to your ~/.bashrc:\nexport PYENV_ROOT=\u0026#34;$HOME/.pyenv\u0026#34; export PATH=\u0026#34;$PYENV_ROOT/bin:$PATH\u0026#34; eval \u0026#34;$(pyenv init -)\u0026#34; For zsh users, add the same lines to ~/.zshrc. If you’re using fish shell, add to ~/.config/fish/config.fish:\nset -Ux PYENV_ROOT $HOME/.pyenv set -U fish_user_paths $PYENV_ROOT/bin $fish_user_paths pyenv init - | source Restart your shell or run source ~/.bashrc (or your shell’s config file) to apply the changes.\nEssential pyenv Commands List available Python versions:\npyenv install --list This shows all Python versions available for installation, including CPython, PyPy, and other implementations.\nInstall a specific Python version:\npyenv install 3.12.0 pyenv install 3.11.5 pyenv install 3.8.18 pyenv will download, compile, and install the Python version in ~/.pyenv/versions/.\nList installed versions:\npyenv versions The asterisk indicates your currently active version.\nSet global Python version:\npyenv global 3.12.0 This sets the default Python version for your entire system (user-level, not system-wide).\nSet local Python version for a project:\ncd ~/projects/my-legacy-app pyenv local 3.8.18 This creates a .python-version file in your project directory. Whenever you enter this directory, pyenv automatically switches to that version.\nSet Python version for current shell session:\npyenv shell 3.11.5 This temporarily overrides the global and local settings for your current terminal session.\nReal-World pyenv Workflow Here’s how you might use pyenv in practice:\n# Install multiple Python versions pyenv install 3.12.0 pyenv install 3.8.18 # Set Python 3.12 as your default pyenv global 3.12.0 # Create a new project directory mkdir ~/projects/legacy-client-app cd ~/projects/legacy-client-app # This project needs Python 3.8 pyenv local 3.8.18 # Verify you\u0026#39;re using the correct version python --version # Output: Python 3.8.18 # Create a virtual environment with this Python version python -m venv venv source venv/bin/activate # Work on your project... pip install -r requirements.txt When you leave this directory and enter another project, pyenv automatically switches to that project’s Python version based on its .python-version file.\npyenv with Virtual Environments pyenv works seamlessly with Python’s built-in venv module and third-party tools like virtualenv. A common pattern is:\nUse pyenv to set the Python version for your project Create a virtual environment using that Python version Install project dependencies in the virtual environment This gives you both version isolation (via pyenv) and dependency isolation (via virtual environments).\nTroubleshooting Common pyenv Issues Build dependencies missing: If Python installation fails, you likely need development libraries. On Ubuntu/Debian:\nsudo apt-get update sudo apt-get install -y build-essential libssl-dev zlib1g-dev \\ libbz2-dev libreadline-dev libsqlite3-dev curl \\ libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev On macOS, ensure Xcode Command Line Tools are installed:\nxcode-select --install Python version not switching: Make sure pyenv init is in your shell configuration and that you’ve restarted your shell.\nSlow Python installation: pyenv compiles Python from source. On macOS, you can use pre-compiled versions with pyenv install --patch:\nPYTHON_BUILD_CACHE_PATH=$HOME/.pyenv/cache pyenv install 3.12.0 Part 2: Managing Java Versions with SDKMAN SDKMAN (Software Development Kit Manager) is a versatile tool for managing parallel versions of multiple Software Development Kits on Unix-based systems. While it supports many SDKs (Gradle, Maven, Kotlin, Scala), we’ll focus on its Java management capabilities.\nInstalling SDKMAN Installation is straightforward and works on any Unix-based system:\ncurl -s \u0026#34;https://get.sdkman.io\u0026#34; | bash After installation, open a new terminal or run:\nsource \u0026#34;$HOME/.sdkman/bin/sdkman-init.sh\u0026#34; Verify the installation:\nsdk version SDKMAN automatically integrates with bash, zsh, and other shells by adding initialization code to your shell’s configuration file.\nEssential SDKMAN Commands List available Java versions:\nsdk list java This displays all available Java distributions, including Oracle JDK, OpenJDK, Amazon Corretto, Temurin (AdoptOpenJDK), GraalVM, and more. You’ll see vendor identifiers and version numbers.\nInstall a specific Java version:\nsdk install java 21.0.1-tem # Temurin JDK 21 sdk install java 11.0.21-amzn # Amazon Corretto 11 sdk install java 17.0.9-graal # GraalVM 17 SDKMAN downloads and installs the JDK in ~/.sdkman/candidates/java/.\nList installed Java versions:\nsdk list java | grep installed Or see currently installed versions with:\nls ~/.sdkman/candidates/java/ Set default Java version:\nsdk default java 21.0.1-tem This sets your system-wide default Java version.\nUse a specific Java version in current shell:\nsdk use java 11.0.21-amzn This temporarily switches to Java 11 for your current terminal session without changing the default.\nCheck current Java version:\nsdk current java java -version Project-Specific Java Versions with .sdkmanrc SDKMAN supports project-specific SDK versions through a .sdkmanrc file. Create this file in your project root:\ncd ~/projects/my-java-app sdk env init This creates a .sdkmanrc file. Edit it to specify your Java version:\njava=17.0.9-tem Now, whenever you enter this directory and run:\nsdk env SDKMAN automatically switches to Java 17 for that shell session.\nFor automatic switching, you can enable the SDKMAN hook in your shell configuration. Add to your ~/.bashrc or ~/.zshrc:\n# This loads the sdkman init script and enables auto-env export SDKMAN_DIR=\u0026#34;$HOME/.sdkman\u0026#34; [[ -s \u0026#34;$HOME/.sdkman/bin/sdkman-init.sh\u0026#34; ]] \u0026amp;\u0026amp; source \u0026#34;$HOME/.sdkman/bin/sdkman-init.sh\u0026#34; # Auto-switch Java version when entering a directory with .sdkmanrc sdk_auto_env() { if [[ -f \u0026#34;.sdkmanrc\u0026#34; ]]; then sdk env fi } cd() { builtin cd \u0026#34;$@\u0026#34; \u0026amp;\u0026amp; sdk_auto_env } For fish shell, first install fisher, then\nfisher install edc/bass Add this to fish config file at ~/.config/fish/config.fish\n# SDKMAN initialization set -gx SDKMAN_DIR $HOME/.sdkman function sdk bass source \u0026#34;$HOME/.sdkman/bin/sdkman-init.sh\u0026#34; \u0026#39;;\u0026#39; sdk $argv end Real-World SDKMAN Workflow Here’s a practical example of using SDKMAN:\n# Install multiple Java versions sdk install java 21.0.1-tem sdk install java 17.0.9-tem sdk install java 11.0.21-amzn # Set Java 21 as default sdk default java 21.0.1-tem # Create a project that needs Java 17 mkdir ~/projects/spring-legacy-service cd ~/projects/spring-legacy-service # Initialize SDKMAN config for this project sdk env init # Edit .sdkmanrc to specify Java 17 echo \u0026#34;java=17.0.9-tem\u0026#34; \u0026gt; .sdkmanrc # Apply the configuration sdk env # Verify you\u0026#39;re using Java 17 java -version # Output: openjdk version \u0026#34;17.0.9\u0026#34; # Build your project with Maven or Gradle ./gradlew build When you navigate to different projects, you can quickly switch Java versions:\ncd ~/projects/new-microservice sdk use java 21.0.1-tem # Use Java 21 for new features cd ~/projects/legacy-client sdk use java 11.0.21-amzn # Switch to Java 11 for compatibility Managing Other JVM Tools with SDKMAN SDKMAN isn’t just for Java. It can also manage related tools:\nInstall Gradle:\nsdk install gradle 8.5 Install Maven:\nsdk install maven 3.9.6 Install Kotlin:\nsdk install kotlin 1.9.21 You can use the same commands (list, install, use, default) for these tools.\nUninstalling Versions To remove versions you no longer need:\nsdk uninstall java 11.0.21-amzn This helps keep your system clean and saves disk space.\nBest Practices and Tips Commit version files to Git: Both .python-version and .sdkmanrc files should be committed to your repository. This ensures all team members use the same language versions.\nDocument version requirements: Add a README section explaining which tool versions are required and how to set them up using pyenv and SDKMAN.\nUse version ranges carefully: Specify exact versions in production environments but consider more flexible versions for development.\nKeep tools updated: Periodically update pyenv and SDKMAN themselves:\n# Update pyenv cd ~/.pyenv \u0026amp;\u0026amp; git pull # SDKMAN updates itself automatically, but you can force it sdk update Clean up old versions: Remove unused Python and Java versions to save disk space:\npyenv uninstall 3.7.12 sdk uninstall java 8.0.392-tem Shell integration matters: Ensure both tools are properly initialized in your shell configuration for the best experience.\nConclusion Version managers like pyenv and SDKMAN transform the chaos of managing multiple language versions into a streamlined, predictable process. Whether you’re maintaining legacy applications, experimenting with new language features, or simply working across multiple projects, these tools provide the flexibility and control you need.\nWith pyenv, you can effortlessly switch between Python versions and ensure every project runs on its intended interpreter. With SDKMAN, Java version management becomes trivial, letting you work with different JDKs, distributions, and even other JVM languages without headaches.\nThe initial setup takes just a few minutes, but the time saved and frustration avoided over the course of your development career is immeasurable. Install these tools today, and never worry about version conflicts again.\nQuick Reference Card\npyenv:\nInstall version: pyenv install 3.12.0 List versions: pyenv versions Set global: pyenv global 3.12.0 Set local: pyenv local 3.11.5 Set for shell: pyenv shell 3.10.0 SDKMAN:\nInstall version: sdk install java 21.0.1-tem List versions: sdk list java Set default: sdk default java 21.0.1-tem Use temporarily: sdk use java 17.0.9-tem Check current: sdk current java Project config: Create .sdkmanrc and run sdk env References: https://github.com/pyenv/pyenv\nhttps://sdkman.io\nhttps://github.com/sdkman/sdkman-cli/issues/671\n","permalink":"https://learncodecamp.net/dev-tools-manage-multiple-python-java-versions-pyenv-sdkman/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAs developers, we often find ourselves juggling multiple projects, each requiring different versions of Python or Java. Maybe you’re maintaining a legacy application that runs on Python 3.8 while building a new microservice on Python 3.12. Or perhaps you’re working with Java 11 for one client and Java 21 for another. Manually managing these versions can quickly become a nightmare of PATH variables, symlinks, and “it works on my machine” debugging sessions.\u003c/p\u003e","title":"Managing Multiple Python and Java Versions: A Developer’s Guide to pyenv and SDKMAN"},{"content":"Introduction If you’ve ever stared at a cryptic error message from a CLI tool wondering “What HTTP requests is this thing actually making?”, you’re not alone. Whether it’s a failed git clone, a mysterious npm install error, or tracking claude code for finding prompts, finding out what data your application sends to third-party services, understanding HTTP traffic is crucial for modern development. Enter HTTP Toolkit – an open-source powerhouse that makes intercepting and debugging HTTP traffic almost effortless.\nWhat is HTTP Toolkit? HTTP Toolkit is a cross-platform, open-source tool for debugging, testing, and building with HTTP on Windows, Linux, and Mac. Think of it as a sophisticated HTTP proxy with a beautiful interface that lets you see exactly what’s happening on the wire – without the complexity of traditional network analysis tools.\nWhy HTTP Toolkit Stands Out Unlike traditional network debugging tools that capture everything on your machine (creating noise and potential side effects), HTTP Toolkit offers targeted interception. You can capture traffic from:\nIndividual browser windows Specific mobile apps (Android/iOS) Backend processes (Node.js, Python, Ruby, Java, PHP) Docker containers Terminal sessions (the star of our show today) And more This precision means you see only the traffic you care about, making debugging faster and more effective.\nThe Power of Terminal Interception Terminal interception is one of HTTP Toolkit’s most powerful features. Here’s why it’s game-changing:\nOne-Click Setup Getting started is ridiculously simple:\nInstall HTTP Toolkit Open the application Navigate to the Intercept page Click the “Terminal” or “Fresh Terminal” button A new terminal window opens, pre-configured to route all HTTP traffic through HTTP Toolkit That’s it. No complex proxy configuration, no certificate installation headaches, no environment variable tweaking. HTTP Toolkit handles all of that automatically.\nWhat Gets Intercepted? Once you launch an intercepted terminal, the vast majority of CLI tools and languages automatically use HTTP Toolkit’s proxy and trust its certificate. This includes:\nVersion control: git clone, git push, git pull Package managers: npm install, pip install, apt-get update, gem install HTTP clients: curl, wget, http Cli tools : claude code, gemini cli, codex Language runtimes: Node.js scripts, Python applications, Ruby programs And many more: Most tools that respect standard environment variables This works through a combination of:\nEnvironment variables like HTTP_PROXY and HTTPS_PROXY PATH modifications to wrap certain commands Automatic certificate trust configuration Two Flavors of Terminal Interception HTTP Toolkit offers two approaches:\n1. Fresh Terminal Launches a new terminal window completely preconfigured for interception. This is the most reliable option and ensures everything is set up correctly from the start.\n2. Existing Terminal Provides a copyable command that you can paste into any existing terminal window to enable interception on the fly. This is incredibly convenient when you’re already working in a terminal and don’t want to switch contexts.\nAvailable for:\nBash Fish PowerShell Real-World Use Cases Let’s explore some practical scenarios where terminal interception shines:\n1. Understanding claude code system prompts 2. Understanding how git clone works git clone https://github.com/example/repo.git 3. Security and Privacy Auditing Concerned about what data your CLI tools might be sending to analytics services?\nWith terminal interception, you can:\nSee if tools are tracking your usage Identify unexpected third-party connections Verify that sensitive data isn’t being transmitted Ensure applications respect your privacy settings 4. API Development and Testing When building applications that consume APIs:\n# Run your Python script python api_client.py # Execute your Node.js application node index.js You can see every API call your code makes, including:\nRequest/response headers Query parameters Request bodies Response data Timing and performance metrics Advanced Features for Terminal Users Breakpoints and Live Editing HTTP Toolkit isn’t just for passive observation. You can:\nSet breakpoints on matching requests Pause live traffic for manual editing Modify requests on the fly (URLs, methods, headers, bodies) Mock responses without forwarding to the real server Inject errors and timeouts to test error handling This is invaluable for testing how your applications handle various scenarios without actually having to create those conditions in your backend.\nPowerful Inspection Tools Every captured request shows:\nComplete URL with parsed query parameters HTTP method and status code All request and response headers (with MDN documentation inline) Full request and response bodies with: Syntax highlighting for JSON, XML, HTML, JavaScript, and more Base64 decoding Protobuf parsing Hex view Automatic formatting and pretty-printing The interface is built on Monaco – the same editing engine that powers Visual Studio Code – so you get professional-grade tools for examining HTTP traffic.\nFiltering and Search When you’re debugging complex issues, you need to find the needle in the haystack. HTTP Toolkit provides:\nContent-type categorization (images, JSON, errors, etc.) Source tagging (which client sent which traffic) Free-text search across all request/response data Structured filtering by URL, status, headers, and more Docker Integration HTTP Toolkit also excels at intercepting Docker containers launched from terminal:\nbash\n# In an intercepted terminal docker run my-container # Or with Docker Compose docker-compose up All HTTP traffic from the container is automatically captured. You can also attach to already-running containers through the UI for dynamic interception.\nTechnical Implementation For those curious about how it works:\nHTTP Toolkit primarily uses environment variables like HTTP_PROXY that many tools and frameworks check automatically. When you launch an intercepted terminal:\nA proxy server starts (if not already running) Environment variables are set pointing to this proxy The PATH is modified to wrap certain commands Certificate trust is configured Changes are inherited by subprocesses For the “Existing Terminal” option, the tool generates shell-specific commands that apply these same settings to any running terminal session.\nGetting Started Ready to try it? Here’s the quick start:\nDownload: Visit httptoolkit.com and download for your platform Install: Follow the standard installation process Launch: Open HTTP Toolkit from your applications menu or run httptoolkit from the command line Intercept: Click “Fresh Terminal” on the Intercept page Explore: Run any command and watch the traffic appear in the View page That’s literally it. Within minutes, you’ll be intercepting and analyzing HTTP traffic like a pro.\nHTTP Toolkit transforms HTTP debugging from a frustrating guessing game into a transparent, visual experience. Terminal interception specifically opens up a world of understanding for CLI tools, build processes, and command-line workflows.\nWhether you’re debugging a failing API integration, learning how network protocols work, auditing privacy concerns, or testing error handling, HTTP Toolkit’s terminal interception gives you the visibility you need.\nResources:\nOfficial Website: httptoolkit.com GitHub: github.com/httptoolkit/httptoolkit Documentation: httptoolkit.com/docs ","permalink":"https://learncodecamp.net/terminal-http-intercept/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIf you’ve ever stared at a cryptic error message from a CLI tool wondering “What HTTP requests is this thing actually making?”, you’re not alone. Whether it’s a failed \u003ccode\u003egit clone\u003c/code\u003e, a mysterious \u003ccode\u003enpm install\u003c/code\u003e error, or tracking claude code for finding prompts, finding out what data your application sends to third-party services, understanding HTTP traffic is crucial for modern development. Enter \u003cstrong\u003eHTTP Toolkit\u003c/strong\u003e – an open-source powerhouse that makes intercepting and debugging HTTP traffic almost effortless.\u003c/p\u003e","title":"Debugging HTTP Traffic Like a Pro: HTTP Toolkit and Terminal Interception"},{"content":"Hallucinations in RAG (Retrieval-Augmented Generation) chatbots can undermine user trust and lead to misinformation. In this comprehensive guide, we’ll explore proven strategies to minimize these AI-generated inaccuracies and build more reliable chatbot systems.\nIf you’re building a RAG chatbot, you’ve likely encountered the frustrating problem of hallucinations—when your AI confidently provides incorrect or fabricated information. The good news? There are effective, battle-tested solutions to dramatically reduce these errors. Let’s dive into the multi-layered approach that actually works.\nRetrieval Quality Improvements The foundation of any reliable RAG system is high-quality retrieval. If your chatbot can’t find the right information, it’s much more likely to make things up. Here’s how to get retrieval right:\nBetter Chunk Strategy One of the most impactful changes you can make is optimizing how you chunk your documents. Use smaller, semantically coherent chunks—typically between 150-300 tokens—with some overlap between chunks. This sweet spot ensures each piece contains enough context while remaining focused.\nDon’t forget to include metadata like source, date, and context information. This helps the model assess relevance and gives users transparency about where information comes from.\nHybrid Search Relying on a single search method leaves gaps. Combining dense embeddings (semantic search) with sparse retrieval methods like keyword search or BM25 gives you the best of both worlds. Semantic search catches conceptual matches, while keyword search ensures you don’t miss exact terms and phrases.\nReranking After initial retrieval, add a reranking step using a cross-encoder model to surface the most relevant passages. Models like Cohere’s reranker or BGE-reranker can significantly improve the quality of context sent to your LLM, which directly reduces hallucinations.\nQuery Reformulation Sometimes the user’s question isn’t phrased in a way that matches your documents. Use your LLM to generate multiple query variations or hypothetical answers, then retrieve for each variation. This comprehensive approach ensures you’re not missing relevant information due to phrasing differences.\n💡 Pro Tip: The quality of your retrieval system is often more important than the size of your language model. Invest time in getting retrieval right before trying bigger models. Generation Controls Even with perfect retrieval, you need to guide your LLM to use that information correctly. These generation controls act as guardrails:\nExplicit Grounding Instructions Your prompts should explicitly instruct the model to only answer from the provided context. More importantly, teach it to say “I don’t have information about that” when the context is insufficient. This simple instruction can prevent countless hallucinations.\nCitation Requirements Force your model to cite specific passages for each claim it makes. This serves two purposes: it makes hallucination harder (since the model must ground each statement), and it makes verification easier for both you and your users.\nConfidence Scoring Have your model rate its confidence in each response, or flag when it’s uncertain. You can implement this through explicit prompting or by analyzing the model’s logprobs (log probabilities). Low confidence responses can trigger additional verification or simply be flagged to users.\nStructured Outputs Use JSON mode or structured generation formats to separate the actual answer from metadata like confidence levels and source citations. This makes it easier to programmatically verify responses and handle uncertain answers appropriately.\nPost-Processing Verification Don’t just trust the model’s output—verify it programmatically:\nEntailment Checking Use a Natural Language Inference (NLI) model to verify that claims in the response are actually entailed by the retrieved context. This automated fact-checking step catches many hallucinations before they reach users.\nSelf-Consistency Generate multiple responses to the same query and check for agreement. Alternatively, have the model review its own answer against the source documents. Inconsistencies are red flags for potential hallucinations.\nFact Extraction and Verification Extract specific factual claims from the response and verify each one against your source documents. This granular approach catches subtle inaccuracies that might slip through other verification methods.\nSystem Design Best Practices Beyond individual techniques, how you design your overall system matters:\nContext Window Management More isn’t always better. Don’t overflow the context window with too many retrieved documents. In practice, 3-5 highly relevant chunks often outperform 20 mediocre ones. Quality over quantity is the rule here.\nRetrieval Feedback Show users the sources your chatbot used and provide a way for them to report when responses don’t match those sources. This feedback loop helps you continuously improve your system and builds user trust.\nFallback to Retrieval If your retrieval system finds no good matches (indicated by low similarity scores), don’t let the model generate an answer anyway. Instead, simply inform the user that no relevant information was found. It’s better to be honest about limitations than to hallucinate.\nFine-Tuning Consider fine-tuning your embedding model on domain-specific data to improve retrieval accuracy. You can also fine-tune your LLM to better follow grounding instructions specific to your use case.\nQuick Win Combination For the biggest immediate impact, combine these three approaches: strong retrieval (hybrid search + reranking), clear prompting about when to refuse answering, and requiring citations for all claims. Wrapping Up Eliminating hallucinations entirely from RAG chatbots may be impossible, but you can reduce them to manageable levels with the right combination of strategies. Start with solid retrieval quality, add generation controls, implement verification steps, and design your system with transparency in mind. Remember that different approaches work better for different use cases. Experiment with these techniques to find the combination that works best for your specific domain and requirements. The investment in reducing hallucinations pays dividends in user trust and system reliability. What’s your experience with RAG hallucinations? Have you found other effective solutions? Share your thoughts in the comments below! ","permalink":"https://learncodecamp.net/how-to-stop-hallucinations-in-rag-chatbots/","summary":"\u003cp\u003eHallucinations in RAG (Retrieval-Augmented Generation) chatbots can undermine user trust and lead to misinformation. In this comprehensive guide, we’ll explore proven strategies to minimize these AI-generated inaccuracies and build more reliable chatbot systems.\u003c/p\u003e\n\u003cp\u003eIf you’re building a RAG chatbot, you’ve likely encountered the frustrating problem of hallucinations—when your AI confidently provides incorrect or fabricated information. The good news? There are effective, battle-tested solutions to dramatically reduce these errors. Let’s dive into the multi-layered approach that actually works.\u003c/p\u003e","title":"How to Stop Hallucinations in RAG Chatbots: A Complete Guide"},{"content":"Large language models are getting smarter—but the real superpower may be how we feed them context. Instead of constantly fine-tuning weights, a growing family of techniques improves models by upgrading the inputs they see: richer instructions, reusable strategies, domain heuristics, and concrete evidence. The paper “Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models” proposes ACE, a practical framework that treats context like an evolving playbook—something you grow, refine, and curate over time to make agents and reasoning systems measurably better.\nBelow is a deep dive into what ACE is, why it’s needed, how it works, and what the results show.\nWhy context needs an upgrade Two failure modes plague many prompt-optimization and memory systems:\nBrevity bias — Optimizers converge to short, generic prompts that read well but shed domain detail (tool quirks, failure patterns, edge-case rules). That hurts agents and knowledge-heavy tasks where specifics matter. Context collapse — If you keep asking an LLM to rewrite the entire context, it often compresses rich knowledge into a sparse summary, erasing important material. A striking case in the paper shows context shrinking from ~18k tokens to ~122 tokens in one step, with a sharp accuracy drop. The authors argue we should saturate contexts—keep abundant, actionable details—because LLMs are quite capable of picking relevant bits at inference time. Long-context models and KV-cache reuse make this practical in production.\nWhat is ACE? ACE (Agentic Context Engineering) reframes context as a modular, evolving playbook. Instead of monolithic rewrites, the system adds and edits small, structured “bullets”—reusable strategies, pitfalls, code snippets, formatting schemas—guided by generation, reflection, and curation.\nThe three roles Generator: attempts tasks, producing trajectories that reveal what worked and what failed. Reflector: diagnoses mistakes and distills concrete insights from traces and feedback (e.g., tool outputs, unit tests, ground truth when available). Curator: merges delta updates into the playbook with lightweight, non-LLM logic—no full rewrite—enabling parallel, low-latency adaptation. Key innovations Incremental delta updates: store context as itemized bullets (with IDs and helpful/harmful counters). Only the relevant bullets get updated; new bullets append. This preserves knowledge and cuts cost. Grow-and-refine: periodically de-duplicate, prune, and update bullets using embeddings so the context scales without bloat. The result is a comprehensive, searchable, and auditable knowledge surface—more like a living runbook than a sloganized prompt. The paper even shows a partial ACE playbook with “strategies \u0026amp; hard rules,” “useful code snippets,” and “troubleshooting \u0026amp; pitfalls.”\nWhere ACE helps most The authors target two demanding settings:\nInteractive agents (AppWorld): multi-turn coding + tool use (APIs, files, messages) with normal and challenge test splits and a public leaderboard. Financial reasoning (FiNER, Formula): XBRL tagging and numerical extraction/computation—tasks that need precise domain tactics. Results at a glance ACE consistently outperforms strong baselines like ICL, MIPROv2, GEPA, and Dynamic Cheatsheet, and it often does so without ground-truth labels by leveraging natural execution feedback.\nAgents (AppWorld) Offline: ReAct + ACE beats ReAct + ICL and ReAct + GEPA by double digits on average, showing that a detailed, evolving context is better than a fixed demo set or a single “optimized” instruction. Online: ReAct + ACE improves over Dynamic Cheatsheet by ~7.6% average; with offline warmup + online updates, it performs even better. Leaderboard parity: Using a smaller open-source model (DeepSeek-V3.1), ACE matches a top production GPT-4.1 agent on average and surpasses it on the challenge split. (Figure on page 14 shows the leaderboard snapshot.) Finance (FiNER \u0026amp; Formula) Offline: +8.6% average over strong baselines; on Formula, ACE’s detailed playbook yielded very large improvements. Online: ACE also outperforms Dynamic Cheatsheet in average accuracy, though the paper cautions that feedback quality matters—without reliable signals, any adaptive method may degrade. Cost \u0026amp; latency Because ACE updates deltas and merges with non-LLM logic, adaptation is fast and cheap:\nOffline (AppWorld): −82.3% latency and −75.1% rollouts vs. GEPA. Online (FiNER): −91.5% latency and −83.6% token cost vs. DC. The paper further argues that longer contexts don’t imply linear serving costs, thanks to KV-cache reuse, compression, and offload—so context-rich serving is getting easier in production stacks.\nHow ACE actually writes better contexts To make this concrete, the appendices include prompts for each ACE role:\nGenerator prompts reference the ACE Playbook explicitly and instruct the agent to apply relevant sections (e.g., pagination rules, identity resolution). Reflector prompts teach the model to diagnose errors (e.g., wrong source of truth, pagination logic) and tag which bullets were helpful/harmful, producing structured insights. examples include roommate identification via Phone app vs. Venmo heuristics.) Curator prompts ask for pure JSON “operations” to add only new bullets to the right sections, preventing duplicates and avoiding monolithic rewrites. This disciplined prompting keeps updates localized and auditable, breaking the cycle of context collapse.\nPractical takeaways for builders Favor accumulation over distillation: Keep concrete API schemas, code patterns, gotchas, and domain heuristics. Let the model decide relevance at inference time. Structure your context store: Treat context as versioned bullets with IDs, metadata, and counters for helpful/harmful usage. This enables fine-grained retrieval and safe editing. Close the loop with execution feedback: Unit tests, tool errors, output format mismatches, and environment traces are gold for the Reflector to extract robust lessons—no labels required for many agent tasks. Avoid full rewrites: Use delta merges and periodic de-dup to scale contexts without erasing history. Mind the feedback quality: Without reliable signals (labels or trustworthy environment outcomes), any adaptive method can pollute its memory. Consider guardrails and human review for high-stakes domains. Limitations \u0026amp; open questions Reflector dependence: If your model can’t extract meaningful insights, the playbook may get noisy. Some tasks also prefer concise meta-instructions over long contexts (e.g., HotPotQA). Domain boundaries: Not all applications benefit from exhaustive contexts. ACE shines where tool use, domain rules, and recurring pitfalls dominate success. Bottom line ACE shows that context is not a static prompt—it’s an artifact your system should continuously engineer. By generating, reflecting, and curating small, structured updates instead of rewriting everything, you can boost accuracy, slash adaptation cost, and stabilize long-horizon performance for agents and domain reasoning. If you’re building LLM applications, start thinking of your prompt not as prose, but as a living playbook.\nPlaybook that got generated after four queries\nwhat are ai agents what are ai agents ( repeated) tell me something about apple macbooks which mobile phone should i buy? Playbook now contains learning from all the questions and answers.\n# CONTEXT PLAYBOOK ## STRATEGIES [STR-00001] (✓1 ✗0): Add an explicit evaluation section with metrics (success rate, time-to-goal, robustness under noise, safety violations), test environments, and repeatable evaluation protocols. Include a minimal pseudocode or skeleton code for the agent loop (perceive -\u0026gt; update beliefs -\u0026gt; decide -\u0026gt; act -\u0026gt; observe -\u0026gt; learn). Provide a small glossary of terms (perception, belief, plan, policy, reward, constraint). Incorporate a safety and alignment checklist (guardrails, monitoring, anomaly detection, fallback modes). Include guidance on reward design and potential failure modes (reward hacking). Offer a brief section on multi-agent coordination patterns (communication primitives, negotiation, consensus). [STR-00002] (✓2 ✗0): Treating AI models as fully autonomous agents without implementing a continuous feedback loop and monitoring. Ignoring uncertainty and partial observability in perception. Poor reward or objective design leading to unintended behaviors. Overlooking safety, privacy, and ethical constraints. Underestimating deployment considerations (scalability, logging, versioning, auditing, and containment in case of failures). [STR-00013] (✓1 ✗0): Add a concise quick-start summary at the top with a layered depth option (short answer first, expandable details). Include a runnable mini-example (e.g., a thermostat-like agent) to illustrate the loop in practice. [STR-00024] (✓0 ✗0): Annotate recommendations with model-year or generation (e.g., M1/M2/M3 era), and note that port configurations and thermals vary by release. Include a small, up-to-date check for the user’s region and current prices. [STR-00025] (✓0 ✗0): Add a brief interactive prompt flow: ask for budget, primary tasks, preferred screen size, and portability vs. performance priority, then tailor the recommendation accordingly. [STR-00026] (✓0 ✗0): State clearly that the reasoning steps are not exposed, but provide a concise justification for each recommendation (e.g., why 16 GB RAM is beneficial for heavy multitasking). [STR-00027] (✓0 ✗0): Offer multiple quick-start calculators (e.g., budget-based, workload-based, and ecosystem reliance) and document the assumptions behind the heuristics. [STR-00028] (✓0 ✗0): Highlight common buyer pitfalls (aging battery, storage needs, RAM bottlenecks) and propose safe fallback options (e.g., refurbished models, extended warranties). ## COMMON MISTAKES [COM-00008] (✓1 ✗0): Avoid: Limited emphasis on evaluation and verification: no explicit metrics, test protocols, or ablation studies for agent performance. [COM-00009] (✓1 ✗0): Avoid: Insufficient focus on safety, alignment, and governance beyond high-level mentions; lacks concrete guardrails and monitoring strategies. [COM-00010] (✓1 ✗0): Avoid: Lack of concrete implementation guidance: no pseudocode, data schemas, or lightweight skeletons beyond a high-level workflow. [COM-00011] (✓1 ✗0): Avoid: Insufficient treatment of uncertainty and partial observability in practical sensing and decision-making (beyond generic mentions). [COM-00012] (✓1 ✗0): Avoid: Minimal discussion of multi-agent coordination challenges (communication protocols, conflict resolution, trust) beyond listing as a category. [COM-00014] (✓0 ✗0): Avoid excessive length; prevent redundancy; clearly separate high-level concepts from safety/governance; ensure not to reveal chain-of-thought and keep internal reasoning in private or in policy-compliant form. [COM-00020] (✓0 ✗0): Avoid: Potential information overload due to length and breadth; could overwhelm beginners. [COM-00021] (✓0 ✗0): Avoid: Some redundancy or repetition (e.g., STR-00002 appears twice in the used bullets). [COM-00022] (✓0 ✗0): Avoid: Lacks concrete runnable example beyond pseudocode; could include a tiny, working snippet to illustrate the loop. [COM-00023] (✓0 ✗0): Avoid: No explicit, concrete evaluation metrics or test scenarios beyond high-level mentions. [COM-00033] (✓0 ✗0): Avoid: Some generalizations could become outdated (e.g., fanless status on all Air models, exact port availability varies by year/generation). [COM-00034] (✓0 ✗0): Avoid: Lack of explicit model-year or generation references which can affect accuracy across releases. [COM-00035] (✓0 ✗0): Avoid: No explicit caveat about price ranges or regional availability, which can shift quickly with new releases. [COM-00036] (✓0 ✗0): Avoid: Limited emphasis on potential model-specific design nuances (keyboard history, repairability, serviceability) that matter to long-term ownership. ## BEST PRACTICES [BES-00003] (✓1 ✗0): The response is well-structured and coherent, delivering a comprehensive high-level overview of AI agents. [BES-00004] (✓2 ✗0): Key concepts are covered: perception, representation/reasoning, action/execution, learning/adaptation, goals, and the agent-environment loop. [BES-00005] (✓1 ✗0): A broad taxonomy is provided (reactive, model-based, deliberative, hybrid, BDI, utility-based, learning), along with single-agent vs. multi-agent and environment classifications. [BES-00006] (✓1 ✗0): Practical grounding with real-world examples and a high-level blueprint for building a simple agent is included. [BES-00007] (✓1 ✗0): Design considerations (goals, perception, robustness, safety, explainability, ethics, privacy, scalability) are acknowledged. [BES-00015] (✓0 ✗0): Provided a comprehensive, well-structured overview covering perception, reasoning, action, learning, and goals. [BES-00016] (✓0 ✗0): Included multiple agent architectures (reactive, model-based, deliberative, hybrid, BDI, utility-based, learning) and both single- and multi-agent contexts. [BES-00017] (✓0 ✗0): Integrated practical considerations (safety, governance, ethics, explainability) and a getting-started blueprint. [BES-00018] (✓0 ✗0): Added an agent-environment loop and a minimal pseudocode skeleton to illustrate the feedback loop. [BES-00019] (✓0 ✗0): Separated content into clear sections (glossary, examples, evaluation, governance), aiding readability and scannability. [BES-00029] (✓0 ✗0): Structured, layered answer with a quick-start summary, expandable details, and a runnable example. [BES-00030] (✓0 ✗0): Inclusion of a runnable Python snippet provides a tangible decision aid and demonstrates how to implement a simple model-choice helper. [BES-00031] (✓0 ✗0): Comprehensive coverage of MacBook Air vs. MacBook Pro, including Apple Silicon context, typical use cases, and practical buying tips. [BES-00032] (✓0 ✗0): Clear pros/cons, ecosystem considerations, and practical guidance on RAM/storage/planning for future needs. Refrences: https://arxiv.org/abs/2510.04618 https://github.com/001shahab/Agentic_Context_Engineering?tab=readme-ov-file\n","permalink":"https://learncodecamp.net/agentic-context-engineering-ace-turning-context-into-a-self-improving-playbook-for-llms/","summary":"\u003cp\u003eLarge language models are getting smarter—but the real superpower may be \u003cstrong\u003ehow we feed them context\u003c/strong\u003e. Instead of constantly fine-tuning weights, a growing family of techniques improves models by upgrading the \u003cem\u003einputs\u003c/em\u003e they see: richer instructions, reusable strategies, domain heuristics, and concrete evidence. The paper \u003cstrong\u003e“Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models”\u003c/strong\u003e proposes \u003cstrong\u003eACE\u003c/strong\u003e, a practical framework that treats context like an \u003cstrong\u003eevolving playbook\u003c/strong\u003e—something you \u003cstrong\u003egrow, refine, and curate\u003c/strong\u003e over time to make agents and reasoning systems measurably better.\u003c/p\u003e","title":"Agentic Context Engineering (ACE): Turning Context Into a Self-Improving Playbook for LLMs"},{"content":"Introduction When training large language models (LLMs) the most important question is simple: how do we measure whether the model is doing well? For regression you use mean squared error, for classification you might use cross-entropy or hinge loss. But for LLMs — which predict sequences of discrete tokens — the right way to turn “this output feels wrong” into a number you can optimize is a specific kind of probability loss: categorical cross-entropy / negative log likelihood, and the closely related, more interpretable metric perplexity.\nThis post explains, step by step and with practical code snippets, how inputs → model outputs → loss → training come together for autoregressive LLMs (GPT style). It covers shapes, batching, indexing the correct probabilities, computing loss efficiently in PyTorch, and how to interpret perplexity.\nQuick overview: the training objective for autoregressive LLMs An autoregressive LLM is trained to predict the next token at each position. Given a sequence of tokens\nx = [x₁, x₂, ..., x_T]\nthe model provides at each position t a probability distribution P(y | x₁..x_t) over the vocabulary. Training reduces to maximizing the probability the model assigns to the true next tokens seen in data — equivalently, minimizing the average negative log probability of true next tokens across all positions and examples.\nPut another way:\nFor each input token position the model predicts a probability vector over the vocabulary (via logits → softmax). For each position we pick out the probability the model assigned to the correct target token. Take the negative log of those probabilities (giving a positive loss per token). Average across tokens (and batches) → scalar loss to backpropagate. That scalar is the standard categorical cross-entropy / negative log likelihood used everywhere in LLM training.\nShapes and indexing — why dimensionality matters It helps to keep dimensions explicit. Typical shapes:\nbatch_size = B seq_len = T (context length; e.g., 256) vocab_size = V (GPT-2 token set: 50,257) Model output (logits) has shape (B, T, V).\nEach (V,) slice along the last axis is the (unnormalized) scores for the next token at that position.\nTargets (the true next token IDs used for loss) have shape (B, T). Targets are essentially the input sequence shifted left by one position (the next-token prediction target).\nWhen computing loss we usually flatten the first two dims into a single axis of B*T so we can compute cross-entropy in one go:\nlogits_flat.shape = (B*T, V) targets_flat.shape = (B*T,) This flattening makes it straightforward to call library loss functions (e.g., torch.nn.functional.cross_entropy) which expect (*, V) logits and (*) integer targets.\nManual view of the computation of loss functions for llm (conceptual) Model returns logits of shape (B, T, V). Convert logits → probabilities with softmax across V. Each row gives a distribution over next tokens. For each example position, select probability assigned to the true token p = probs[b, t, target_id]. Compute -log(p) (negative log likelihood) for that position. Average these values across all (b,t) to produce scalar loss. This is exactly what cross_entropy does under the hood (it combines the softmax and negative log likelihood in a numerically stable way).\nExample with a toy vocabulary Imagine a tiny vocabulary V = 7 and the model outputs, for a single sequence of length 3, these per-position probability rows:\npos 1: [0.10, 0.60, 0.20, 0.05, 0.00, 0.02, 0.01] pos 2: [ ... ] pos 3: [ ... ] If the true next tokens at positions 1,2,3 are indices i1, i2, i3, we pick probabilities p1 = probs[0,i1], p2 = probs[1,i2], p3 = probs[2,i3]. Loss = -mean(log p1, log p2, log p3).\nThe training goal is to raise each p_k toward 1 (equivalently, maximize log p_k), so that the model assigns near-unit probability to correct next tokens.\nPerplexity — an interpretable scalar Perplexity is an alternative view of the same underlying loss, and it’s often used for LMs because it’s more directly interpretable.\nDefinition:\nIf L is the average negative log likelihood (in natural log base e) per token (the loss we computed), then\nperplexity = exp(L) Interpretation: perplexity is the effective branching factor — the number of equally likely choices the model behaves as if it is choosing between when predicting the next token. Lower is better.\nIf perplexity = 2 → model is about as uncertain as choosing between 2 equally likely tokens on average. If perplexity = V (vocab size) → model is nearly uniform across the whole vocabulary; essentially guessing at random. Numerical example: if loss L = 10.79, then\nperplexity = exp(10.79) ≈ 48,533 This would mean “the model is as uncertain as choosing from ≈48.5k tokens,” which is very high for a vocabulary ~50k — so the model is effectively guessing.\nPerplexity is helpful because it maps the log-loss back to a quantity with an intuitive meaning related to vocabulary size.\nWhy cross-entropy is the right fit The model outputs a probability distribution over discrete tokens → we need a probability-based loss. Cross-entropy / negative log-likelihood directly penalizes low probability assigned to true data tokens. It chains neatly into gradient descent (it’s differentiable w.r.t. logits). It generalizes classification loss to the multi-class, per-position nature of language modeling. Summary LLMs are trained by predicting the next token at every position. The standard objective is the negative log likelihood / categorical cross-entropy averaged over token positions and examples. In practice, compute loss = F.cross_entropy(logits.view(-1, V), targets.view(-1)) in PyTorch. Perplexity = exp(loss) is an interpretable alternative: it tells you the effective number of choices the model is hedging between when predicting the next token. Watch tensor shapes carefully, mask padding, and use library loss functions for numeric stability. References: https://www.youtube.com/watch?v=Zxf-34voZss\u0026amp;list=PLPTV0NXA_ZSgsLAr8YCgCwhPIJNNtexWu\u0026amp;index=29\n","permalink":"https://learncodecamp.net/loss-function-for-ll/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhen training large language models (LLMs) the most important question is simple: how do we \u003cem\u003emeasure\u003c/em\u003e whether the model is doing well? For regression you use mean squared error, for classification you might use cross-entropy or hinge loss. But for LLMs — which predict sequences of discrete tokens — the right way to turn “this output feels wrong” into a number you can optimize is a specific kind of probability loss: \u003cstrong\u003ecategorical cross-entropy / negative log likelihood\u003c/strong\u003e, and the closely related, more interpretable metric \u003cstrong\u003eperplexity\u003c/strong\u003e.\u003c/p\u003e","title":"Loss functions for llm — a practical, hands-on guide"},{"content":"Introduction In the attention mechanism used by Large Language Models (LLMs) like transformers (e.g., GPT), the core idea is to allow the model to dynamically focus on relevant parts of the input sequence when generating or understanding text.\nThis is achieved through a process called scaled dot-product attention, where input tokens (e.g., words or subwords) are transformed into three types of vectors: Q K V, Query (Q), Key (K), and Value (V). These are not arbitrary; they’re learned projections of the input embeddings via linear transformations matrices\n$\\displaystyle W^{Q},\\quad W^{K},\\quad W^{V}$\nVector Definition Role in Attention Query (Q) A vector representing the “question” or current token’s need for information. For a given token:$Q = X \\cdot W^{Q}$ , where$X$ is the input embedding matrix. Acts as the “search query” to determine what information to retrieve from other tokens. It’s dotted with Keys to compute relevance scores. Key (K) A vector representing the “label” or summary of each token’s content.$K = X \\cdot W^{K}$ . Used to match against the Query. The similarity is measured with$Q \\cdot K^{T}$ . Value (V) A vector holding the actual “content” or features of each token.$V = X \\cdot W^{V}$ . The output is a weighted sum of these:$\\text{Attention} = \\operatorname{softmax}!\\left(\\frac{QK^{T}}{\\sqrt{d_k}}\\right)V$ . The weights come from Query–Key similarities, aggregating useful info from relevant tokens. Dimensions: Typically, Q, K, and V have the same dimension dk​ (e.g., 64 or 512 in multi-head attention), and the softmax ensures weights sum to 1. Multi-Head Attention: In practice, LLMs use multiple “heads” (parallel attention computations with different projections), concatenating results for richer representations. Intuition Behind Q, K, V Think of attention like a content-addressable memory system or a smart database lookup—it’s inspired by how humans retrieve information from memory by associating cues:\nQuery (Q) as “What am I looking for?”: Imagine you’re reading a sentence and processing the word “bank.” Your Query might ask, “Is this the river bank or financial bank?” It probes the sequence for clues.\nKey (K) as “What do I have available?”: Each word in the input (e.g., “river” or “money”) has a Key that summarizes its essence. The dot product Q⋅KQ \\cdot KQ⋅K is like a similarity score: high if the Key matches your Query (e.g., “river” scores high for river-bank context), low otherwise.\nValue (V) as “What do I retrieve?”: Once matches are found (via softmax-normalized weights), you pull the full details (Values) from those matching tokens and blend them into a context-aware representation for your current word. This lets the model “attend” to distant or relevant parts of the text without rigid sequential processing.\nWithout it, models like RNNs struggle with vanishing gradients over sequences. The scaling by$\\sqrt{d_k}$ prevents dot products from exploding in high dimensions.\nIn short: Q asks, K answers “how relevant?”, and V delivers the goods—turning raw sequences into contextually rich outputs!\n$\\text{Attention}(Q, K, V) = \\operatorname{softmax}!\\left(\\frac{QK^{T}}{\\sqrt{d_k}}\\right) V$\nFor more detailed understanding you can check this video : https://www.youtube.com/watch?v=UjdRN80c6p8\u0026amp;t=1156s\n","permalink":"https://learncodecamp.net/q-k-v-vectors-in-the-attention-mechanism/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn the attention mechanism used by Large Language Models (LLMs) like transformers (e.g., GPT), the core idea is to allow the model to dynamically focus on relevant parts of the input sequence when generating or understanding text.\u003c/p\u003e\n\u003cp\u003eThis is achieved through a process called \u003cstrong\u003escaled dot-product attention\u003c/strong\u003e, where input tokens (e.g., words or subwords) are transformed into three types of vectors: Q K V, \u003cstrong\u003eQuery (Q)\u003c/strong\u003e, \u003cstrong\u003eKey (K)\u003c/strong\u003e, and \u003cstrong\u003eValue (V)\u003c/strong\u003e. These are not arbitrary; they’re learned projections of the input embeddings via linear transformations matrices\u003c/p\u003e","title":"Q K V : Query (Q), Key (K), and Value (V) Vectors in the Attention Mechanism"},{"content":"Introduction Token embeddings (aka vector embeddings) turn tokens — words, subwords, or characters — into numeric vectors that encode meaning.\nThey’re the essential bridge between raw text and a neural network.\nIn this post, below we will run a small demos (Word2Vec-style analogies, similarity checks), and provide concrete PyTorch code that demonstrates how an embedding layer works, I also include a tiny toy training loop so you see embeddings updated by backprop.\nWhy token embeddings (intuition) Computers only understand numbers. Assigning random integers (token IDs) or one-hot vectors to words does not capture word relationships — e.g. cat and kitten should be closer than cat and banana. Embeddings are vectors (e.g., 300-D, 768-D) where semantic relationships are geometric relationships: similar words are near each other, arithmetic often works (e.g., king + woman − man ≈ queen). Practically, an embedding layer is just a lookup table: for each token ID you fetch a row (a vector). During LLM training those rows (the embedding weight matrix) are learned (initialized randomly, then optimized via backprop). Hands-on demos — code you can run Below are runnable code examples. They demonstrate:\nLoading pre-trained word vectors (Word2Vec / Google News). Doing analogy arithmetic and similarity. A tiny toy training loop that updates embedding weights, and predicts words based on embedding similarity # Requirements: # pip install fse gensim import gensim.downloader as api # Load the prepackaged word2vec-google-news-300 model via fse model = api.load(\u0026#34;word2vec-google-news-300\u0026#34;) # Analogy: king + woman - man -\u0026gt; queen result = model.most_similar( positive=[\u0026#39;king\u0026#39;, \u0026#39;woman\u0026#39;], negative=[\u0026#39;man\u0026#39;], topn=5 ) print(\u0026#34;king + woman - man -\u0026gt;\u0026#34;, result[:5]) # Similarity examples (cosine similarity) pairs = [(\u0026#39;woman\u0026#39;, \u0026#39;man\u0026#39;), (\u0026#39;king\u0026#39;, \u0026#39;queen\u0026#39;), (\u0026#39;paper\u0026#39;, \u0026#39;water\u0026#39;)] for a, b in pairs: print(f\u0026#34;sim({a},{b}) = {model.similarity(a, b):.4f}\u0026#34;) # Find nearest neighbors print(\u0026#34;Nearest to \u0026#39;tower\u0026#39;:\u0026#34;, model.most_similar(\u0026#39;tower\u0026#39;, topn=10)) #output king + woman - man -\u0026gt; [(\u0026#39;queen\u0026#39;, 0.7118192911148071), (\u0026#39;monarch\u0026#39;, 0.6189674735069275), (\u0026#39;princess\u0026#39;, 0.5902431011199951), (\u0026#39;crown_prince\u0026#39;, 0.5499460697174072), (\u0026#39;prince\u0026#39;, 0.5377321243286133)] sim(woman,man) = 0.7664 sim(king,queen) = 0.6511 sim(paper,water) = 0.1141 Nearest to \u0026#39;tower\u0026#39;: [(\u0026#39;towers\u0026#39;, 0.8531749844551086), (\u0026#39;skyscraper\u0026#39;, 0.6417425870895386), (\u0026#39;Tower\u0026#39;, 0.639177143573761), (\u0026#39;spire\u0026#39;, 0.5946877598762512), (\u0026#39;responded_Understood_Atlasjet\u0026#39;, 0.5931612849235535), (\u0026#39;storey_tower\u0026#39;, 0.5783935189247131), (\u0026#39;SolarReserve_molten_salt\u0026#39;, 0.5733036398887634), (\u0026#39;monopole_tower\u0026#39;, 0.566946804523468), (\u0026#39;bell_tower\u0026#39;, 0.5626808404922485), (\u0026#39;foot_monopole\u0026#39;, 0.5514882802963257)] This code is a minimal example of how word/token embeddings work in PyTorch:\nVocabulary setup\nA tiny vocabulary (['fox','house','in','is','quick','the']) is created and mapped to integer IDs (stoi). This lets the model convert words → numbers. Embedding layer\nnn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_dim) creates a learnable lookup table of shape (vocab_size, embed_dim). Each word ID gets mapped to a small vector (here, 3-dimensional). Input tokens\nThe input sentence fragment \u0026quot;in is the house\u0026quot; is converted into IDs [2, 3, 5, 1] using the stoi mapping. Embedding lookup\nembedding(input_ids) fetches the corresponding row vectors from the embedding matrix, producing a tensor of shape (4, 3) (one 3-D vector for each token). Weights inspection\nembedding.weight shows the full embedding matrix for all words in the vocab (size 6 × 3 here). # Requirements: torch # pip install torch import torch import torch.nn.functional as F from torch import nn # Small toy vocabulary \u0026amp; mapping vocab = [\u0026#39;fox\u0026#39;, \u0026#39;house\u0026#39;, \u0026#39;in\u0026#39;, \u0026#39;is\u0026#39;, \u0026#39;quick\u0026#39;, \u0026#39;the\u0026#39;] # size=6 stoi = {w:i for i,w in enumerate(vocab)} vocab_size = len(vocab) embed_dim = 3 # tiny embedding dimension for clarity # Make an embedding layer (vocab_size x embed_dim) embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_dim) # Example input ids (tokens \u0026#34;in\u0026#34;, \u0026#34;is\u0026#34;, \u0026#34;the\u0026#34;, \u0026#34;house\u0026#34; mapped to their IDs) # Suppose token IDs are [2, 3, 5, 1] input_ids = torch.tensor([stoi[\u0026#39;in\u0026#39;], stoi[\u0026#39;is\u0026#39;], stoi[\u0026#39;the\u0026#39;], stoi[\u0026#39;house\u0026#39;]]) # shape: (4,) print(input_ids) # Single-line lookup (batch lookup) embeds = embedding(input_ids) # shape: (4, embed_dim) print(\u0026#34;Embeddings shape:\u0026#34;, embeds.shape) print(\u0026#34;Embeddings:\\n\u0026#34;, embeds) # If you want the raw embedding weight matrix: print(\u0026#34;Embedding weight matrix (vocab_size x embed_dim):\\n\u0026#34;, embedding.weight) The code trains embeddings + a simple linear classifier to learn word-to-next-word transitions from the essay.\nEssay text → a ~200-word paragraph on cats is used as raw training data. Tokenization → text is lowercased, split into words, duplicates removed to build a vocabulary. Vocab \u0026amp; mappings → token2id maps words → IDs, id2token maps IDs → words. Corpus as IDs → essay tokens are converted into a sequence of integers. Training pairs → create (current_word → next_word) pairs from the sequence. Model → ToyModel has: nn.Embedding: turns word IDs into vectors. nn.Linear: projects vectors into logits over vocab. Training loop → runs for 500 epochs: Shuffle pairs, do forward pass, compute cross-entropy loss, backprop, optimizer step. Loss decreases as embeddings + weights are updated. Prediction function → given a word, look up its ID, get logits → probabilities, return top-k likely next words. Test predictions → check what the model thinks comes after words like \u0026quot;cats\u0026quot; or \u0026quot;the\u0026quot;. import random import torch import torch.nn.functional as F from torch import nn, optim # --- 1. Essay on Cats (200 words) --- essay = \u0026#34;\u0026#34;\u0026#34; Cats are fascinating creatures that have been companions to humans for thousands of years. They are known for their independence, agility, and mysterious behavior. Unlike dogs, cats often prefer quiet corners where they can observe their surroundings without being disturbed. Their sharp eyes and quick reflexes make them excellent hunters, even in domestic environments. Cats communicate through subtle body language—tail movements, ear positions, and gentle purring. Each cat has a unique personality; some are playful and energetic, while others are calm and affectionate. Despite their reputation for independence, cats often form strong bonds with their owners, seeking warmth and comfort in their presence. They enjoy routines and can be sensitive to changes in their environment. Cats also have a remarkable ability to adapt, whether they live in bustling cities or peaceful countryside homes. Their grooming habits keep them clean, and their graceful movements make them a delight to watch. For many people, the presence of a cat brings a sense of calm and companionship. It is no wonder that cats remain one of the most beloved pets in the world, admired for both their beauty and spirit. \u0026#34;\u0026#34;\u0026#34; # --- 2. Tokenization (very simple split) --- tokens = essay.lower().replace(\u0026#34;\\n\u0026#34;, \u0026#34; \u0026#34;).split() vocab = sorted(set(tokens)) print(\u0026#34;Vocabulary size:\u0026#34;, len(vocab)) # --- 3. Build mappings --- token2id = {tok: idx for idx, tok in enumerate(vocab)} id2token = {idx: tok for tok, idx in token2id.items()} # --- 4. Convert essay into IDs --- ids = [token2id[tok] for tok in tokens] # --- 5. Build training pairs (current → next) --- pairs = [(ids[i], ids[i+1]) for i in range(len(ids)-1)] # --- 6. Define model --- class ToyModel(nn.Module): def __init__(self, vocab_size, embed_dim): super().__init__() self.embedding = nn.Embedding(vocab_size, embed_dim) self.linear = nn.Linear(embed_dim, vocab_size) def forward(self, x): e = self.embedding(x) # (batch, embed_dim) logits = self.linear(e) # (batch, vocab_size) return logits vocab_size = len(vocab) embed_dim = 16 # bigger than 3 now model = ToyModel(vocab_size, embed_dim) opt = optim.Adam(model.parameters(), lr=0.01) loss_fn = nn.CrossEntropyLoss() # --- 7. Training loop --- for epoch in range(500): # small for demo random.shuffle(pairs) losses = [] for inp, target in pairs: inp_t = torch.tensor([inp]) target_t = torch.tensor([target]) logits = model(inp_t) loss = loss_fn(logits, target_t) opt.zero_grad() loss.backward() opt.step() losses.append(loss.item()) if epoch % 10 == 0: print(f\u0026#34;Epoch {epoch}, avg loss {sum(losses)/len(losses):.4f}\u0026#34;) # --- 8. Prediction function --- def predict_next(word, topk=5): model.eval() with torch.no_grad(): if word not in token2id: return f\u0026#34;Word \u0026#39;{word}\u0026#39; not in vocab.\u0026#34; inp_id = torch.tensor([token2id[word]]) logits = model(inp_id) probs = F.softmax(logits, dim=-1) top_probs, top_ids = torch.topk(probs, k=topk, dim=-1) results = [] for p, i in zip(top_probs[0], top_ids[0]): results.append((id2token[i.item()], float(p))) return results # --- 9. Test predictions --- for test_word in [\u0026#34;cats\u0026#34;, \u0026#34;the\u0026#34;, \u0026#34;independence\u0026#34;, \u0026#34;companions\u0026#34;]: preds = predict_next(test_word) print(f\u0026#34;Given \u0026#39;{test_word}\u0026#39; → next candidates: {preds}\u0026#34;) #output Epoch 470, avg loss 0.7142 Epoch 480, avg loss 0.7152 Epoch 490, avg loss 0.7183 Given \u0026#39;cats\u0026#39; → next candidates: [(\u0026#39;often\u0026#39;, 0.3571014702320099), (\u0026#39;also\u0026#39;, 0.18115095794200897), (\u0026#39;remain\u0026#39;, 0.16612175107002258), (\u0026#39;communicate\u0026#39;, 0.1458531767129898), (\u0026#39;are\u0026#39;, 0.13305993378162384)] Given \u0026#39;the\u0026#39; → next candidates: [(\u0026#39;most\u0026#39;, 0.296306848526001), (\u0026#39;world,\u0026#39;, 0.27701255679130554), (\u0026#39;presence\u0026#39;, 0.2574588656425476), (\u0026#39;cat\u0026#39;, 0.019772468134760857), (\u0026#39;sense\u0026#39;, 0.014828374609351158)] Given \u0026#39;independence\u0026#39; → next candidates: Word \u0026#39;independence\u0026#39; not in vocab. Given \u0026#39;companions\u0026#39; → next candidates: [(\u0026#39;to\u0026#39;, 1.0), (\u0026#39;prefer\u0026#39;, 2.8792931580645664e-11), (\u0026#39;form\u0026#39;, 2.24102333912235e-11), (\u0026#39;cats\u0026#39;, 6.964049996394106e-12), (\u0026#39;have\u0026#39;, 6.138758338464223e-12)] Embedding matrices in real LLMs To build the embedding matrix you need two numbers: vocabulary size (number of tokens). E.g., GPT-2 uses 50,257 tokens. embedding dimension (size of each token vector): e.g., 768 for GPT-2 small, 1,024 or larger for bigger models. The embedding matrix shape is (vocab_size, embed_dim). It is initialized randomly and learned during pretraining. In practice, embeddings are trained as part of the same optimization that trains the whole model (predict next token / masked token / contrastive objectives, depending on algorithm). Summary Instead of assigning arbitrary IDs or using one-hot vectors, embeddings provide dense representations where geometric closeness reflects semantic similarity.\nThey can be static (pretrained and frozen) or dynamically updated during training, as in LLM pretraining. For scale, GPT-2’s embedding matrix with a vocabulary of 50,257 and dimension of 768 already holds about 38.6 million parameters, showing how significant embeddings are in both size and importance.\nToken embeddings are surprisingly powerful: a compact vector can capture syntactic and semantic relationships, support arithmetic-style analogies, and form the foundation of language model inputs.\nReferences: https://github.com/rasbt/LLMs-from-scratch, https://www.youtube.com/watch?v=ghCSGRgVB_o\u0026amp;list=PLPTV0NXA_ZSgsLAr8YCgCwhPIJNNtexWu\u0026amp;index=10\n","permalink":"https://learncodecamp.net/token-embeddings/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eToken embeddings (aka vector embeddings) turn tokens — words, subwords, or characters — into numeric vectors that \u003cem\u003eencode meaning\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eThey’re the essential bridge between raw text and a neural network.\u003c/p\u003e\n\u003cp\u003eIn this post, below we will run a small demos (Word2Vec-style analogies, similarity checks), and provide concrete PyTorch code that demonstrates how an embedding layer works, I also include a tiny toy training loop so you see embeddings updated by backprop.\u003c/p\u003e","title":"Token Embeddings — what they are, why they matter, and how to build them (with working code)"},{"content":"Introduction Byte Pair Encoding (BPE) is a subword tokenization scheme that gives us the best of both worlds: compact vocabulary sizes (not the full wordlist), the ability to represent any unknown word (by falling back to subwords/characters), and meaningful shared pieces (roots, suffixes) that help models generalize.\nGPT-2 used a BPE tokenizer with a vocabulary of ≈50,257 tokens, and OpenAI’s tiktoken is a fast Rust-backed implementation you can use today. Below I explain the why, the how (intuition + algorithm), and a short hands-on demo using tiktoken.\nWhy we even care about tokenizers When we feed text to an LLM we must map text → integer IDs. How we split text into tokens has big consequences:\nWord-level tokenizers: each distinct word is a token. Pros: short token sequences. Cons: huge vocabularies and out-of-vocabulary (OOV) problems — any unseen word breaks things. Character-level tokenizers: each character is a token. Pros: tiny vocab, no OOV. Cons: very long token sequences and loss of meaningful multi-character chunks (roots/suffixes). Subword tokenizers (BPE): the hybrid. Keep frequent words intact, break rare words into subwords or characters. This lets the tokenizer: avoid OOVs, keep useful common words as single tokens, and represent morphological similarity (e.g., token, tokenize, tokenization share pieces). That combination is why GPT-2/GPT-3 adopted BPE.\nBPE at a glance — intuition BPE began as a simple compression algorithm (1994): repeatedly find the most frequent adjacent pair of bytes and merge them into a new symbol. Applied to language, we:\nStart with characters (plus a special end-of-word marker). Count adjacent-symbol pairs across the corpus. Find the most frequent pair → merge it into a new subword token. Repeat until you’ve created the desired number of tokens (or no pair repeats enough). This produces tokens that are not full words nor single characters, but subwords — e.g., old, -est, ing, s, etc. The algorithm automatically discovers frequent roots/suffixes (like est in finest, lowest) and preserves frequent full words (like the, and) as single tokens.\nStep-by-step example Imagine a tiny corpus: old, older, finest, lowest.\nAdd a word-end marker (I’ll use \u0026lt;/w\u0026gt; conceptually).\nStart as characters with \u0026lt;/w\u0026gt; appended.\nFrequent adjacent pairs like e+s, es+t, then est\u0026lt;/w\u0026gt; emerge and get merged.\nThe algorithm progressively constructs tokens like old, est\u0026lt;/w\u0026gt;, etc.\nFinal vocabulary includes characters, useful subwords, and a manageable number of tokens. That’s how BPE captures morphology and reduces vocabulary size versus full word vocabularies.\nThe practical implementation — tiktoken (OpenAI’s Rust-backed BPE) Tiktoken is a performant implementation of the exact class of BPE tokenizers used for OpenAI models. You can install and use it quickly. Below I include the exact code you provided (unchanged) and its sample outputs.\n# pip install tiktoken import importlib import tiktoken print(\u0026#34;tiktoken version:\u0026#34;, importlib.metadata.version(\u0026#34;tiktoken\u0026#34;)) # tiktoken version: 0.7.0 tokenizer = tiktoken.get_encoding(\u0026#34;gpt2\u0026#34;) text = ( \u0026#34;Hello, do you like tea? In the sunlit terraces\u0026#34; \u0026#34;of someunknownPlace.\u0026#34; ) integers = tokenizer.encode(text, allowed_special={\u0026#34;\u0026#34;}) print(integers) # [15496, 11, 466, 345, 588, 8887, 30, 220, 50256, 554, 262, 4252, 18250, 8812, 2114, 1659, 617, 34680, 27271, 13] strings = tokenizer.decode(integers) print(strings) # Hello, do you like tea? In the sunlit terracesof someunknownPlace. Notes on that snippet:\n50256 is the token id used for the special \u0026lt;|endoftext|\u0026gt; token in the GPT-2 encoding. GPT-2’s BPE vocab is commonly reported as ≈50,257 tokens (IDs 0..50256 inclusive). The unknown-looking word someunknownPlace gets broken into subword pieces rather than causing an error — that’s BPE in action. tiktoken is typically much faster than a pure-Python BPE implementation because the core is implemented in Rust. Key takeaways BPE = subword tokenizer: preserves frequent whole words, splits rare words into subwords/characters. Solves OOV: any input can be tokenized because it can be represented as characters/subwords. Captures morphology: roots and suffixes are discovered automatically (e.g., est in finest/lowest). Reasonable vocab size: GPT-2 uses ~50k tokens, much smaller than the full word list of English. Fast, production-ready tooling: use tiktoken (Rust-backed) for encoding/decoding; it’s notably faster than a Python-only implementation. Practical tips \u0026amp; gotchas Stop criteria for learning merges: when building a vocabulary via BPE, you stop after reaching a preset vocabulary size (e.g., 30k, 50k) or when no pair occurs often enough. GPT-style models often pick ~50k tokens for a balance of sequence length vs vocab expressiveness. Special tokens: preserve and register “end-of-text” or other special tokens so they’re treated atomically by the tokenizer (as shown in the example with allowed_special). Token counts ≠ word counts: a single English word might be multiple tokens (e.g., antidisestablishmentarianism) or many words may be a single token (the, a, punctuation combos depending on tokenizer). Consistency matters: use the exact encoding used for the model — mixing tokenizers will break embedding alignment and model inputs. Resources tiktoken — fast BPE tokenizer (Rust implementation used by OpenAI). Install with pip install tiktoken. OpenAI GPT-2 encoder reference (original Python implementation): \u0026lt;a href=\u0026quot;https://github.com/openai/gpt-2/blob/master/src/encoder.py\u0026quot;\u0026gt;https://github.com/openai/gpt-2/blob/master/src/encoder.py\u0026lt;/a\u0026gt; https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb https://www.youtube.com/watch?v=fKd8s29e-l4 https://learncodecamp.net/thinking-in-llm-models-2/ Recap BPE is simple in idea but extremely powerful in practice. It’s the reason models like GPT-2 could be trained with manageable vocabularies while still gracefully handling arbitrary text from the wild.\n","permalink":"https://learncodecamp.net/bpe/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eByte Pair Encoding (BPE) is a subword tokenization scheme that gives us the best of both worlds: compact vocabulary sizes (not the full wordlist), the ability to represent \u003cem\u003eany\u003c/em\u003e unknown word (by falling back to subwords/characters), and meaningful shared pieces (roots, suffixes) that help models generalize.\u003c/p\u003e\n\u003cp\u003eGPT-2 used a BPE tokenizer with a vocabulary of \u003cstrong\u003e≈50,257 tokens\u003c/strong\u003e, and OpenAI’s \u003ccode\u003etiktoken\u003c/code\u003e is a fast Rust-backed implementation you can use today. Below I explain the why, the how (intuition + algorithm), and a short hands-on demo using \u003ccode\u003etiktoken\u003c/code\u003e.\u003c/p\u003e","title":"Byte Pair Encoding (BPE): the tokenizer that made GPTs practical"},{"content":" Introduction In this blog post, we dive deep into tokenization, the very first step in preparing data for training large language models (LLMs).\nTokenization is more than just splitting sentences into words—it’s about transforming raw text into a structured format that neural networks can process. We’ll build a tokenizer, encoder, and decoder from scratch in Python, and walk through handling unknown tokens and special context markers.\nBy the end, you’ll not only understand how tokenization works but also have working Python code you can adapt for your own projects.\nThe Three Steps of Tokenization At its core, tokenization involves three main steps:\nSplitting text into tokens (words, subwords, punctuation). Mapping tokens to token IDs (unique integers). Encoding token IDs into embeddings (vector representations). In this post, we’ll focus on steps 1 and 2.\nPreparing the Dataset For demonstration, we’ll use The Verdict by Edith Wharton (1908), a short story available for free.\nhttps://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/the-verdict.txt\nimport re # Load dataset with open(\u0026#34;verdict.txt\u0026#34;, \u0026#34;r\u0026#34;) as f: raw_text = f.read() print(\u0026#34;Total characters:\u0026#34;, len(raw_text)) print(\u0026#34;First 100 chars:\u0026#34;, raw_text[:100]) Output:\nTotal characters: 20479 First 100 chars: I had always thought Jack Gisburn... Step 1: Tokenizing Text We’ll use Python’s regular expressions (re) to split text into words and punctuation.\ntext = \u0026#34;Hello, world. Is this-- a test?\u0026#34; result = re.split(r\u0026#39;([,.:;?_!\u0026#34;()\\\u0026#39;]|--|\\s)\u0026#39;, text) result = [item.strip() for item in result if item.strip()] print(result) Output:\n[\u0026#39;Hello\u0026#39;, \u0026#39;,\u0026#39;, \u0026#39;world\u0026#39;, \u0026#39;.\u0026#39;, \u0026#39;Is\u0026#39;, \u0026#39;this\u0026#39;, \u0026#39;--\u0026#39;, \u0026#39;a\u0026#39;, \u0026#39;test\u0026#39;, \u0026#39;?\u0026#39;] 👉 Notice how punctuation marks (. , !) are separate tokens.\nStep 2: Building a Vocabulary and Assigning Token IDs Now we build a vocabulary: a sorted list of unique tokens mapped to integers.\n# Example tokens tokens = [\u0026#34;the\u0026#34;, \u0026#34;quick\u0026#34;, \u0026#34;brown\u0026#34;, \u0026#34;fox\u0026#34;, \u0026#34;jumps\u0026#34;, \u0026#34;the\u0026#34;] # Vocabulary (unique + sorted) vocab = sorted(set(tokens)) print(\u0026#34;Vocabulary:\u0026#34;, vocab) # Map tokens to IDs token2id = {tok: idx for idx, tok in enumerate(vocab)} id2token = {idx: tok for tok, idx in token2id.items()} print(\u0026#34;Token → ID:\u0026#34;, token2id) print(\u0026#34;ID → Token:\u0026#34;, id2token) Output:\nVocabulary: [\u0026#39;brown\u0026#39;, \u0026#39;fox\u0026#39;, \u0026#39;jumps\u0026#39;, \u0026#39;quick\u0026#39;, \u0026#39;the\u0026#39;] Token → ID: {\u0026#39;brown\u0026#39;: 0, \u0026#39;fox\u0026#39;: 1, \u0026#39;jumps\u0026#39;: 2, \u0026#39;quick\u0026#39;: 3, \u0026#39;the\u0026#39;: 4} Implementing a Tokenizer Class To make this reusable, let’s implement a Tokenizer class with encode and decode methods.\nclass SimpleTokenizerV1: def __init__(self, vocab): self.str_to_int = vocab self.int_to_str = {i:s for s,i in vocab.items()} def encode(self, text): preprocessed = re.split(r\u0026#39;([,.:;?_!\u0026#34;()\\\u0026#39;]|--|\\s)\u0026#39;, text) preprocessed = [ item.strip() for item in preprocessed if item.strip() ] ids = [self.str_to_int[s] for s in preprocessed] return ids def decode(self, ids): text = \u0026#34; \u0026#34;.join([self.int_to_str[i] for i in ids]) # Replace spaces before the specified punctuations text = re.sub(r\u0026#39;\\s+([,.?!\u0026#34;()\\\u0026#39;])\u0026#39;, r\u0026#39;\\1\u0026#39;, text) return text Example: tokenizer = SimpleTokenizerV1(vocab) text = \u0026#34;\u0026#34;\u0026#34;\u0026#34;It\u0026#39;s the last he painted, you know,\u0026#34; Mrs. Gisburn said with pardonable pride.\u0026#34;\u0026#34;\u0026#34; ids = tokenizer.encode(text) print(ids) Handling Unknown Tokens and End-of-Text Real-world text contains words not in the vocabulary. To handle this, we add special tokens:\n\u0026lt;UNK\u0026gt; → for unknown words \u0026lt;EOT\u0026gt; → for end of text all_tokens = sorted(list(set(preprocessed))) all_tokens.extend([\u0026#34;\u0026#34;, \u0026#34;\u0026#34;]) vocab = {token:integer for integer,token in enumerate(all_tokens)} Updated Tokenizer (V2): class SimpleTokenizerV2: def __init__(self, vocab): self.str_to_int = vocab self.int_to_str = { i:s for s,i in vocab.items()} def encode(self, text): preprocessed = re.split(r\u0026#39;([,.:;?_!\u0026#34;()\\\u0026#39;]|--|\\s)\u0026#39;, text) preprocessed = [item.strip() for item in preprocessed if item.strip()] preprocessed = [ item if item in self.str_to_int else \u0026#34;\u0026#34; for item in preprocessed ] ids = [self.str_to_int[s] for s in preprocessed] return ids def decode(self, ids): text = \u0026#34; \u0026#34;.join([self.int_to_str[i] for i in ids]) # Replace spaces before the specified punctuations text = re.sub(r\u0026#39;\\s+([,.:;?!\u0026#34;()\\\u0026#39;])\u0026#39;, r\u0026#39;\\1\u0026#39;, text) return text Now, unseen words are mapped to \u0026lt;UNK\u0026gt;.\nSpecial Context Tokens in LLMs Beyond \u0026lt;UNK\u0026gt; and \u0026lt;EOT\u0026gt;, LLMs often use:\n\u0026lt;BOS\u0026gt; (Beginning of Sequence) – Marks the start of text. \u0026lt;EOS\u0026gt; (End of Sequence) – Marks the end. \u0026lt;PAD\u0026gt; (Padding) – Ensures equal sequence length in batches. 💡 GPT models (like GPT-3/4) mostly rely only on \u0026lt;EOT\u0026gt; to separate documents.\nWhy GPT Uses Byte Pair Encoding (BPE) Our tokenizer treats entire words as tokens. But what if a rare word appears? GPT solves this by breaking words into subwords using Byte Pair Encoding (BPE).\nExample:\n\u0026quot;chased\u0026quot; → \u0026quot;ch\u0026quot;, \u0026quot;ase\u0026quot;, \u0026quot;d\u0026quot; This way, even unknown words can be represented through known subword pieces. We’ll explore BPE in the next blog.\nByte Pair Encoding (BPE): the tokenizer that made GPTs practical Recap In this post, we learned:\n✅ Tokenization = breaking text → tokens → IDs\n✅ How to build a vocabulary and assign token IDs\n✅ Implementing an encoder/decoder tokenizer class in Python\n✅ Handling unknown tokens and using \u0026lt;EOT\u0026gt; for document separation\n✅ Why large-scale models prefer subword tokenization (BPE)\nRefrences: https://github.com/rasbt/LLMs-from-scratch/tree/main\nFor a high level view on tokenization read this: https://learncodecamp.net/tokenization/\n","permalink":"https://learncodecamp.net/thinking-in-llm-models-2/","summary":"\u003chr /\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn this blog post, we dive deep into \u003cstrong\u003etokenization\u003c/strong\u003e, the very first step in preparing data for training large language models (LLMs).\u003c/p\u003e\n\u003cp\u003eTokenization is more than just splitting sentences into words—it’s about transforming raw text into a structured format that neural networks can process. We’ll build a \u003cstrong\u003etokenizer, encoder, and decoder\u003c/strong\u003e from scratch in Python, and walk through handling \u003cstrong\u003eunknown tokens\u003c/strong\u003e and \u003cstrong\u003especial context markers\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eBy the end, you’ll not only understand how tokenization works but also have working Python code you can adapt for your own projects.\u003c/p\u003e","title":"Tokenization in Large Language Models: A Hands-On Guide"},{"content":"Ever wondered how advanced AI models can tackle truly complex problems with a depth of analysis that seems to mimic human thought? The secret lies in a groundbreaking capability known as “thinking.” This fascinating development is designed to unblock key bottlenecks on the path to greater intelligence in AI.\nMoving Beyond Fixed Compute Historically, powerful large language models (LLMs) were designed to respond immediately to requests. This meant they applied a constant amount of computing power at “test time”—the moment you ask a question or give a command—to generate a response. This fixed compute budget restricted how deeply the model could “think” about a problem, limiting its ability to handle extremely hard or challenging tasks. Imagine if your brain only spent a fixed millisecond on every problem, no matter its complexity!\nUsers, quite naturally, desire a more dynamic application of compute. Simple requests should be quick and cost-effective, while very complex ones should allow the model to deliberate for far longer, perhaps even a thousand or a million times more. This is precisely what motivates this “thinking” capability in advanced AI models. And you can control the thinking budget / tokens in the latest models.\n\u0026#34;reasoning\u0026#34;: {\u0026#34;effort\u0026#34;: \u0026#34;medium\u0026#34;}, Specify low, medium, or high for this parameter, where low favors speed and economical token usage, and high favors more complete reasoning. How Thinking in LLM models Works: An Iterative Internal Dialogue Mechanically, “thinking” introduces an additional “thinking stage.” Before the model commits to its final answer, it can generate additional text internally, creating an iterative loop of computation. This loop allows the model to perform additional test-time compute during this thinking stage. Crucially, this internal loop can potentially run for thousands or even tens of thousands of iterations, providing a proportional increase in computing power before it decides on its final response. Because it’s a loop, the process is dynamic, meaning the model can learn how many iterations to apply based on the problem’s complexity. It’s like the model is having an internal monologue to work through a problem.\nHow Thinking Models Reason: Learning to Strategise and Self-Correct The ability for these models to “think” is achieved through reinforcement learning (RL). After initial pre-training, the model undergoes an RL stage where it’s trained on many different tasks, receiving positive or negative rewards based on whether it solves the task correctly. This remarkably general training recipe allows the model to interpret a vague signal of correctness and backpropagate this through its thinking loop to shape how it uses its internal computation and tokens.\nResearchers observed some truly remarkable emergent behaviour during this training. For example, in an integer prediction problem, the model was seen using its thinking tokens to:\n• Pose a hypothesis.\n• Test out the hypothesis.\n• Reject its own idea when it found things weren’t working (e.g., stating “this formula doesn’t hold”).\n• Try an alternative approach.\nThis capacity for self-correction and iterative refinement was astonishing to the researchers. Beyond just self-correction, the model learns various sophisticated reasoning strategies, including:\n• Breaking down problems into various components.\n• Exploring multiple solutions.\n• Drafting fragments of code and building them modularly.\n• Performing intermediate calculations.\n• Using tools.\nAll these strategies fall under the umbrella of using more test-time compute to deliver a smarter response.\nThe Impact and Future of Thinking This “thinking” capability isn’t just a fascinating research concept; it’s actively driving more capable models and accelerating overall AI progress. It synergistically combines with existing paradigms like pre-training (scaling data and model size) and post-training (scaling human feedback quality). This combined investment leads to a multiplicative effect and overall faster model improvement. Empirical evidence clearly shows a trend of increasing reasoning performance tracking very well with increasing test-time compute.\nBeyond raw capability, thinking offers developers and users granular control over quality versus cost. While previous models offered a discrete choice of model sizes to balance quality and cost, thinking introduces a continuous “budget,” providing a much more granular slider for how much capability is desired for a given class of tasks. Thinking budgets are now available in certain advanced models, allowing users to fine-tune cost-to-performance ratios and push performance higher for demanding applications.\nLooking ahead, the focus is on: • Improving Reasoning: Generally making models even smarter.\n• Efficiency: Ensuring the thinking process is as efficient as possible, reducing instances of “overthinking” and making it more cost-effective.\n• Deeper Thinking (Deep Think): This is a very high-budget ‘thinking’ mode built on top of advanced models, designed for extremely hard problems. It leverages much deeper and parallel chains of thought that can integrate to produce stronger solutions. For instance, on the USA Math Olympiad, this Deep Think approach significantly boosts performance, allowing for asynchronous processing, letting the model run for extended periods to arrive at robust solutions.\n• Open-ended Coding Tasks: The ability for models to spend longer thinking on complex coding problems could enable tasks that previously took months to be completed in minutes.\n• Pushing Human Understanding: Inspired by figures like mathematician Ramanujan, the ultimate goal is for models to contemplate deeply from a small knowledge base, building up vast knowledge and artefacts to push the frontier of human understanding.\nIn essence, this “thinking” capability marks a significant stride in AI development, moving beyond immediate, fixed-compute responses to models that can internally reason, self-correct, and strategically explore solutions, much like the human mind. The future of AI is looking increasingly thoughtful!\nHere are some links, to learn more on this topic\nhttps://platform.openai.com/docs/guides/reasoning?api-mode=responses\nhttps://openai.com/index/introducing-openai-o1-preview\n","permalink":"https://learncodecamp.net/thinking-in-llm-models/","summary":"\u003cp\u003eEver wondered how advanced AI models can tackle truly complex problems with a depth of analysis that seems to mimic human thought? The secret lies in a groundbreaking capability known as \u003cstrong\u003e“thinking.”\u003c/strong\u003e This fascinating development is designed to unblock key bottlenecks on the path to greater intelligence in AI.\u003c/p\u003e\n\u003ch3 id=\"moving-beyond-fixed-compute\"\u003eMoving Beyond Fixed Compute\u003c/h3\u003e\n\u003cp\u003eHistorically, powerful large language models (LLMs) were designed to respond \u003cstrong\u003eimmediately to requests\u003c/strong\u003e. This meant they applied a \u003cstrong\u003econstant amount of computing power\u003c/strong\u003e at “test time”—the moment you ask a question or give a command—to generate a response. This fixed compute budget restricted how deeply the model could “think” about a problem, limiting its ability to handle extremely hard or challenging tasks. Imagine if your brain only spent a fixed millisecond on every problem, no matter its complexity!\u003c/p\u003e","title":"Unlocking Deeper AI: The Power of Thinking in LLM Models"},{"content":"If you’re displaying images stored in an AWS S3 private bucket using signed URLs, you might have encountered a confusing scenario: images display perfectly in your web pages but throw CORS errors when you try to download them using JavaScript’s fetch API.\nLet’s break down the issue and explore a practical solution.\nUnderstanding the Problem Imagine you have your images securely stored in a private S3 bucket, and you’re using pre-signed URLs to grant temporary access:\nimg src=\u0026#34;https://s3.eu-central-1.amazonaws.com/my.bucket/image.png?...signed_url_params\u0026#34; /\u0026gt; The above works flawlessly; your images render perfectly. You can even open these URLs directly in another browser tab.\nHowever, when attempting to fetch these images with JavaScript:\nfetch(\u0026#39;https://s3.eu-central-1.amazonaws.com/my.bucket/image.png?...signed_url_params\u0026#39;) .then(response =\u0026gt; response.blob()) .then(blob =\u0026gt; { // handle download }); You suddenly encounter the dreaded error:\nAccess to fetch at \u0026#39;...\u0026#39; from origin \u0026#39;https://yourdomain.com\u0026#39; has been blocked by CORS policy: No \u0026#39;Access-Control-Allow-Origin\u0026#39; header is present on the requested resource. Why does this happen?\nWhat’s Causing the CORS Error? This issue typically occurs due to how browsers (particularly Chrome) handle caching with AWS S3 and signed URLs:\nWhen an image URL is initially loaded in a non-CORS context (such as via an \u0026lt;img\u0026gt; tag), Chrome caches the response without any CORS headers. Later, when fetching the same URL via JavaScript (fetch API), Chrome tries to reuse this cached response. Because the cached version didn’t originally include Access-Control-Allow-Origin headers (since they weren’t necessary at the time), Chrome now fails the CORS check. Ideally, AWS S3 should respond with a Vary: Origin header to indicate different responses might be needed based on the requesting origin. However, it currently doesn’t provide this header consistently when the initial request doesn’t include an Origin header.\nPractical Solution: Avoid Browser Caching To fix this issue, the simplest method is to prevent the browser from caching the initial response. This ensures Chrome fetches a fresh copy of the image every time, correctly including the necessary CORS headers.\nHere are three effective ways:\nMethod 1: Add \u0026lt;strong\u0026gt;Cache-Control: no-cache\u0026lt;/strong\u0026gt; to S3 objects Set the Cache-Control metadata of your images in S3 to no-cache. This forces browsers to always fetch a fresh copy of the object.\nMethod 2: Use Query Parameter \u0026lt;strong\u0026gt;response-cache-control=no-cache\u0026lt;/strong\u0026gt; When generating your signed URLs, include the special AWS S3 query parameter:\nconst signedUrl = s3.getSignedUrl(\u0026#39;getObject\u0026#39;, { Bucket: \u0026#39;my.bucket\u0026#39;, Key: \u0026#39;image.png\u0026#39;, Expires: 900, ResponseCacheControl: \u0026#39;no-cache\u0026#39; // Add this! }); This way, AWS automatically includes the Cache-Control: no-cache header in its response, bypassing the cache issue.\nMethod 3: Generate Different Signed URLs Create two slightly different signed URLs for the same object, such as by varying their expiration times. Because the URLs differ, Chrome treats them as separate objects, thus avoiding the problematic cached response.\n// First URL const url1 = s3.getSignedUrl(\u0026#39;getObject\u0026#39;, { Bucket, Key, Expires: 900 }); // Second URL, slightly different expiration const url2 = s3.getSignedUrl(\u0026#39;getObject\u0026#39;, { Bucket, Key, Expires: 960 }); If you are facing CORS issues, without fetch also, or you are not loading image with tag, you should check the bucket policy, it should have values like this.\n[ { \u0026#34;AllowedHeaders\u0026#34;: [\u0026#34;*\u0026#34;], \u0026#34;AllowedMethods\u0026#34;: [\u0026#34;GET\u0026#34;, \u0026#34;HEAD\u0026#34;], \u0026#34;AllowedOrigins\u0026#34;: [ \u0026#34;https://app.example.com\u0026#34;, \u0026#34;https://qa.example.com\u0026#34;, \u0026#34;https://staging.example.net\u0026#34;, \u0026#34;https://main.example.com\u0026#34;, \u0026#34;https://clientapp.example.org\u0026#34; ], \u0026#34;ExposeHeaders\u0026#34;: [], \u0026#34;MaxAgeSeconds\u0026#34;: 3000 } ] References: Chromium bug – https://issues.chromium.org/issues/40381978\nhttps://serverfault.com/questions/856904/chrome-s3-cloudfront-no-access-control-allow-origin-header-on-initial-xhr-req/856948#856948\nConclusion By using these simple yet effective solutions, you can avoid frustrating CORS errors caused by Chrome’s caching behavior with AWS S3 signed URLs.\nChoose the method that best aligns with your workflow, and you’ll have your fetch-based downloads working smoothly again.\n","permalink":"https://learncodecamp.net/resolving-cors-errors-when-fetching-images-from-aws-s3-with-signed-urls/","summary":"\u003cp\u003eIf you’re displaying images stored in an AWS S3 private bucket using signed URLs, you might have encountered a confusing scenario: images display perfectly in your web pages but throw CORS errors when you try to download them using JavaScript’s \u003ccode\u003efetch\u003c/code\u003e API.\u003c/p\u003e\n\u003cp\u003eLet’s break down the issue and explore a practical solution.\u003c/p\u003e\n\u003ch3 id=\"understanding-the-problem\"\u003eUnderstanding the Problem\u003c/h3\u003e\n\u003cp\u003eImagine you have your images securely stored in a private S3 bucket, and you’re using pre-signed URLs to grant temporary access:\u003c/p\u003e","title":"Resolving CORS Errors When Fetching Images from AWS S3 with Signed URLs"},{"content":"Introduction ComfyUI is a powerful, open-source, node-based interface for generative AI workflows, majorly for image and video workflows.\nWhile it’s primarily known for its visual interface, ComfyUI also offers robust API capabilities, enabling developers to integrate and automate workflows programmatically. This guide will walk you through using ComfyUI in API mode.\nComfyUI offers a suite of RESTful and WebSocket API endpoints that enable developers to programmatically interact with its workflow engine. These endpoints facilitate tasks such as queuing prompts, retrieving results, uploading images, and monitoring system status.\n🧰 Setting Up ComfyUI for API Access 1. Installation Begin by cloning the ComfyUI repository and installing the necessary dependencies:\ngit clone https://github.com/comfyanonymous/ComfyUI.git cd ComfyUI pip install -r requirements.txt 2. Running ComfyUI Start the ComfyUI server\npython main.py Before you start, you will need the workflow_api.json file. You can download it from ComfyUI by following these steps:\nEnable dev mode options in the ComfyUI settings Export your API JSON using the “Save (API format)” button. 🧩 Core ComfyUI API Endpoints 1. WebSocket Endpoint Endpoint: /ws Method: WebSocket Description: Establishes a WebSocket connection for real-time updates, including status changes, progress, and execution events. 2. Queue Prompt Endpoint: /prompt Method: POST Description: Queues a workflow for execution. The request body should include the prompt (workflow definition) and client_id. 3. Retrieve Prompt History Endpoint: /history/{prompt_id} Method: GET Description: Fetches the execution history and results for a specific prompt ID. 4. View Generated Images Endpoint: /view Method: GET Description: Retrieves images based on filename, subfolder, and type (input, output, or temp). 5. Upload Images or Masks Endpoint: /upload/{image_type} Method: POST Description: Uploads images or masks to ComfyUI. The {image_type} can be image or mask. The request should include the image data and specify the target folder. 📋 Queue and History Management 6. Get Current Queue Endpoint: /queue Method: GET Description: Retrieves the current state of the queue, including running and pending prompts. 7. Interrupt Execution Endpoint: /interrupt Method: POST Description: Interrupts the execution of the currently running prompt. 8. Delete Items from Queue or History Endpoint: /queue or /history Method: POST Description: Deletes specific items from the queue or history. The request body should specify the IDs of the items to delete. 9. Clear Queue or History Endpoint: /queue or /history Method: POST Description: Clears all items from the queue or history. The request body should include a flag indicating the clear action. ⚙️ System and Configuration Endpoints 10. System Statistics Endpoint: /system_stats Method: GET Description: Provides system and device statistics, such as Python version, operating system, and device information. 11. User Configuration Endpoint: /users Method: GET or POST Description: Retrieves or creates user configuration data. 12. Settings Management Endpoints: Get All Settings: /settings (GET) Get Specific Setting: /settings/{id} (GET) Update Settings: /settings (POST) Update Specific Setting: /settings/{id} (POST) Description: Manages user-specific settings, allowing retrieval and updates of configuration parameters. 🧠 Workflow and Node Information 13. Node Definitions Endpoint: /object_info Method: GET Description: Retrieves definitions of available nodes, including their inputs, outputs, and parameters. 14. Extensions Endpoint: /extensions Method: GET Description: Lists available extensions that can be integrated into workflows. 15. Embeddings Endpoint: /embeddings Method: GET Description: Retrieves a list of available embeddings for use in workflows. 📁 User Data Management 16. User Data Files Endpoints: Get User Data: /userdata/{file} (GET) Store User Data: /userdata/{file} (POST) Description: Manages user-specific data files, allowing retrieval and storage of custom data These endpoints provide a robust interface for integrating ComfyUI into various applications and workflows.\nYou can install a custom node like ComfyUI-load-image-from-url, to send image URL that is publically available, and you can send base64 image back in the API response.\nhttps://github.com/tsogzark/ComfyUI-load-image-from-url\nYou can check this code to see, how the comfyui frontend talks to comfyui backend\nhttps://github.com/Comfy-Org/ComfyUI_frontend/blob/3d4ac079572c39434d3f54213f9f4039d73485b5/src/scripts/api.ts#L709\n","permalink":"https://learncodecamp.net/comfyui-api-endpoints-complete-guide/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eComfyUI is a powerful, open-source, node-based interface for generative AI workflows, majorly for image and video workflows.\u003c/p\u003e\n\u003cp\u003eWhile it’s primarily known for its visual interface, ComfyUI also offers robust API capabilities, enabling developers to integrate and automate workflows programmatically. This guide will walk you through using ComfyUI in API mode.\u003c/p\u003e\n\u003cp\u003eComfyUI offers a suite of RESTful and WebSocket API endpoints that enable developers to programmatically interact with its workflow engine. These endpoints facilitate tasks such as queuing prompts, retrieving results, uploading images, and monitoring system status.\u003c/p\u003e","title":"ComfyUI API Endpoints Guide: Complete Reference for Image Generation Workflows"},{"content":"Introduction If you’re involved in building software today, chances are you’re dealing with data. Lots of it. Maybe it’s user activity, sensor readings, financial transactions, or something else entirely. Martin Kleppmann’s phenomenal book, “Designing Data-Intensive Applications” (often called DDIA), is practically required reading for navigating this landscape.\nLet’s dive into some key takeaways from this essential chapter.\nThe Shift: It’s About the Data, Not Just the CPU The chapter opens by highlighting a crucial distinction: many modern applications are data-intensive, not compute-intensive. While CPU power is abundant, the real challenges often lie in:\nThe sheer amount of data. The inherent complexity of that data. The speed at which it changes. We rely on standard building blocks to manage this:\nDatabases: To store data persistently. Caches: To speed up reads by remembering expensive results. Search Indexes: To allow keyword searching and filtering. Stream Processing: For asynchronous message handling. Batch Processing: To periodically crunch large datasets. The catch? No single tool does it all perfectly for demanding applications. We often end up combining these components, effectively becoming data system designers ourselves, even if we just think of ourselves as application developers. This composite system needs careful thought, which leads us to the three pillars…\nPillar 1: Reliability – Building Systems That Actually Work What does “reliable” mean? A system continuing to work correctly, even in the face of adversity (faults).\nKey Ideas:\nFaults vs. Failures: A fault is one component deviating from spec (e.g., a disk dying). A failure is the system as a whole failing to provide its service to the user. The goal is fault tolerance: designing systems where faults don’t cause failures. Types of Faults: Hardware Faults: Disks crash, RAM fails, networks drop. These happen all the time at scale. Redundancy (RAID, dual power supplies, multi-machine setups) is the classic mitigation, but software fault tolerance (designing the system to handle node loss) is increasingly important, especially in the cloud. Software Errors: These are often systematic bugs triggered by unusual conditions (like the infamous leap second bug). They are harder to anticipate and can cause correlated failures across many nodes. Careful design, rigorous testing, process isolation, monitoring, and allowing quick restarts help. Human Errors: Configuration mistakes, deployment errors, etc., are a leading cause of outages. Mitigation involves designing safer systems (good APIs, admin UIs), thorough testing (including in non-prod environments), easy rollback mechanisms, clear monitoring, and good operational practices. Deliberate Chaos: Techniques like Netflix’s Chaos Monkey deliberately introduce faults to test and ensure fault-tolerance mechanisms actually work. Reliability isn’t just for life-critical systems; it’s crucial for user trust, business continuity, and avoiding data loss, even in “mundane” applications.\nPillar 2: Scalability – Coping Gracefully with Growth A system reliable today might crumble under tomorrow’s load. Scalability is about having strategies to handle growth.\nKey Ideas:\nDescribing Load: You can’t improve what you can’t measure. Define your load clearly using load parameters. This isn’t just “requests per second”; it might be read/write ratios, connections, cache hit rates, or something complex like the fan-out in Twitter’s timeline delivery (how many followers does a tweet need to reach?). Describing Performance: How does the system perform under load? Response Time is Key: For online systems, this is crucial. Averages Lie (or hide the truth): The mean response time doesn’t tell you about user experience. Use Percentiles: The median (p50) shows the typical experience (half users faster, half slower). Higher percentiles (p95, p99, p999) reveal the tail latency – how bad is it for the slowest users? These outliers often matter a lot (e.g., Amazon found slowest users are often high-value customers). Tail Latency Amplification: If a user request requires multiple backend calls, even a small chance of one being slow significantly increases the chance the overall user request is slow. Approaches to Coping: Scaling Up (Vertical): More powerful machine. Simple initially, but hits limits and cost barriers. Scaling Out (Horizontal): Distributing load across multiple machines (shared-nothing). More complex, especially for stateful systems, but necessary for large scale. Elasticity: Automatically adding resources based on load. Useful but can add operational complexity. No Magic Sauce: Scalable architecture is specific to the application’s load patterns and bottlenecks. Assumptions about load are critical. Pillar 3: Maintainability – Designing for the Future (and Your Sanity!) Software’s biggest cost isn’t initial development; it’s ongoing maintenance: bug fixes, operations, adaptations, new features. Designing for maintainability saves pain later.\nKey Ideas:\nOperability: Make life easy for the operations team. This means good monitoring/visibility , automation support, predictability, clear documentation, sensible defaults, and avoiding single points of failure that require downtime for maintenance. Simplicity: Manage complexity. This isn’t about dumbing down functionality but removing accidental complexity (complexity arising from the implementation, not the inherent problem). Abstraction is our most powerful tool here – hiding implementation details behind clean interfaces (think SQL hiding storage details, or high-level languages hiding machine code). Finding good abstractions is hard but vital. Evolvability (or Modifiability, Plasticity): Make it easy to change the system later. Requirements will change. Agile processes help, but on a system level, evolvability links back to simplicity and good abstractions. Well-designed systems are easier to adapt and refactor. Putting It All Together These three pillars – Reliability, Scalability, and Maintainability – aren’t independent silos. Often, achieving one might involve trade-offs with another. Adding redundancy for reliability might increase complexity (impacting maintainability). Scaling out might require complex coordination logic (impacting simplicity).\nThe core message is that thoughtful engineering requires considering these non-functional requirements from the start. They aren’t afterthoughts; they are fundamental properties that determine the long-term success and viability of our data-intensive applications.\nWhat are your biggest challenges related to reliability, scalability, or maintainability in your projects? Share your thoughts in the comments below!\n","permalink":"https://learncodecamp.net/reliability-scalability-maintainability/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIf you’re involved in building software today, chances are you’re dealing with \u003cem\u003edata\u003c/em\u003e. Lots of it. Maybe it’s user activity, sensor readings, financial transactions, or something else entirely. Martin Kleppmann’s phenomenal book, “Designing Data-Intensive Applications” (often called DDIA), is practically required reading for navigating this landscape.\u003c/p\u003e\n\u003cp\u003eLet’s dive into some key takeaways from this essential chapter.\u003c/p\u003e\n\u003ch3 id=\"the-shift-its-about-the-data-not-just-the-cpu\"\u003eThe Shift: It’s About the Data, Not Just the CPU\u003c/h3\u003e\n\u003cp\u003eThe chapter opens by highlighting a crucial distinction: many modern applications are \u003cstrong\u003edata-intensive\u003c/strong\u003e, not \u003cem\u003ecompute-intensive\u003c/em\u003e. While CPU power is abundant, the real challenges often lie in:\u003c/p\u003e","title":"Building Robust Systems: Key Lessons from Designing Data-Intensive Applications Chapter 1"},{"content":"Natural Language Processing (NLP) has revolutionized the way machines understand human language. But before models can learn from text, they need a way to break it down into smaller, understandable units. This is where tokenization comes in — a critical preprocessing step that transforms raw text into a sequence of meaningful components, or tokens.## 🧠 What is Tokenization?\nTokenization is the process of splitting text into smaller units called tokens. These tokens can be as large as words, or as small as characters or subwords.\nExample:\nInput: \u0026#34;ChatGPT is powerful!\u0026#34; Word-level tokens: [\u0026#34;ChatGPT\u0026#34;, \u0026#34;is\u0026#34;, \u0026#34;powerful\u0026#34;, \u0026#34;!\u0026#34;] Character-level tokens: [\u0026#34;C\u0026#34;, \u0026#34;h\u0026#34;, \u0026#34;a\u0026#34;, \u0026#34;t\u0026#34;, \u0026#34;G\u0026#34;, \u0026#34;P\u0026#34;, \u0026#34;T\u0026#34;, \u0026#34; \u0026#34;, \u0026#34;i\u0026#34;, \u0026#34;s\u0026#34;, \u0026#34; \u0026#34;, \u0026#34;p\u0026#34;, \u0026#34;o\u0026#34;, \u0026#34;w\u0026#34;, \u0026#34;e\u0026#34;, \u0026#34;r\u0026#34;, \u0026#34;f\u0026#34;, \u0026#34;u\u0026#34;, \u0026#34;l\u0026#34;, \u0026#34;!\u0026#34;] 🏛 Classical vs Modern Tokenization 🔹 Classical Tokenization (Rule-based) Older NLP systems relied on simple rule-based tokenizers, often splitting on whitespace and punctuation.\nPros:\nEasy to implement Human-readable tokens Cons:\nDoesn’t handle compound words, typos, or unknown words well Language-dependent Popular classical tools:\nNLTK’s word_tokenize spaCy tokenizer from nltk.tokenize import word_tokenize word_tokenize(\u0026#34;Let\u0026#39;s tokenize this!\u0026#34;) # Output: [\u0026#34;Let\u0026#34;, \u0026#34;\u0026#39;s\u0026#34;, \u0026#34;tokenize\u0026#34;, \u0026#34;this\u0026#34;, \u0026#34;!\u0026#34;] 🔹 Modern Tokenization (Subword-based) In modern Transformer-based models, tokenization is more sophisticated:\nByte-Pair Encoding (BPE) – GPT-2/3 WordPiece – BERT Unigram Language Model – SentencePiece / T5 These methods break words into subword units to address Out-Of-Vocabulary (OOV) issues and reduce the vocabulary size.\nExample (BPE):\nInput: \u0026#34;unhappiness\u0026#34; Tokens: [\u0026#34;un\u0026#34;, \u0026#34;happiness\u0026#34;] or [\u0026#34;un\u0026#34;, \u0026#34;happi\u0026#34;, \u0026#34;ness\u0026#34;] Tokenization using Hugging Face Transformer.js libraray You need these two files to load the tokenizer using the Hugging Face tokenizer library.\ntokenizer_config.json tokenizer.json Using this code, you can load these files, and count the tokens in a sentence, and also get the token ids\nawait AutoTokenizer.from_pretrained(pretrained_model_name_or_path) the tokenizer can be initialized using either:\nA model ID string from the Hugging Face Hub:\nThese are publicly hosted models like: \u0026quot;bert-base-uncased\u0026quot; \u0026quot;gpt2\u0026quot; \u0026quot;facebook/bart-large-cnn\u0026quot; \u0026quot;dbmdz/bert-base-german-cased\u0026quot; (note the namespaced format: user_or_org/model_name) A local directory path:\nIf you’ve downloaded or fine-tuned a tokenizer and saved it locally, you can point to that directory: tokenizer = AutoTokenizer.from_pretrained(\u0026#34;./my_model_directory/\u0026#34;) This flexibility allows you to either use official models from Hugging Face or load your custom/pretrained tokenizers from disk.\nTo get the tokens, you can call encode on the tokenizer instance\nconst tokens = tokenizer.encode(text); If the model uses BPE internally, encode function invokes that.\nInternally, BPE uses a cache, now, this uses an LRU cache. Earlier it was a simple hashmap, which resulted in a memory leak kind of scenario, if you called the encode function with different input, it would cache all the results.\nThis is how the memory kept on increasing when I used this. This issue is fixed now.\nhttps://github.com/huggingface/transformers.js/issues/1282\nTo count tokens or to see how different models tokenized text, you can check this app\n","permalink":"https://learncodecamp.net/tokenization/","summary":"\u003cp\u003eNatural Language Processing (NLP) has revolutionized the way machines understand human language. But before models can learn from text, they need a way to break it down into smaller, understandable units. This is where \u003cstrong\u003etokenization\u003c/strong\u003e comes in — a critical preprocessing step that transforms raw text into a sequence of meaningful components, or \u003cem\u003etokens\u003c/em\u003e.## 🧠 What is Tokenization?\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTokenization\u003c/strong\u003e is the process of splitting text into smaller units called \u003cstrong\u003etokens\u003c/strong\u003e. These tokens can be as large as words, or as small as characters or subwords.\u003c/p\u003e","title":"Tokenization"},{"content":"Introduction to MCP The Model Context Protocol (MCP) is an open standard for connecting AI assistants (like large language models) to the systems where data and tools live​\nIn essence, MCP aims to bridge the gap between isolated AI models and real-world data sources – think of it as a “USB-C for AI applications”, providing a universal way to plug an AI model into various databases, file systems, APIs, and other tools​.\nBy standardizing these connections, MCP replaces a tangle of custom integrations with a single protocol, making it simpler and more reliable to give AI systems access to the data they need​. This leads to more relevant and context-aware responses from models, since they can securely fetch documents, query databases, or perform actions as needed.\nWhy was MCP created? As AI assistants became more capable, developers faced a proliferation of one-off “plugins” or connectors for each data source or service. Every new integration (be it your company’s knowledge base, a cloud app, or a local tool) traditionally required custom code and unique APIs, leading to fragmented solutions that are hard to scale​\nMCP addresses this by providing a universal protocol: developers can expose data or functionality through a standard interface, and AI applications can consume that interface without bespoke adapters.\nThe goal is a more sustainable ecosystem where AI systems maintain context across different tools seamlessly​.\nAnthropic open-sourced MCP in late 2024, and it has quickly gained industry support with early adopters like Block and Apollo integrating it, and developer tools companies (Zed, Replit, Codeium, Sourcegraph, etc.) working to enhance their platforms with MCP.\nKey Components and Use Cases of MCP MCP follows a client–server architecture with three main components\nMCP Host: This is the AI-powered application or environment that needs access to external data. Examples of hosts include chat interfaces (like Claude’s chat app or Microsoft’s Copilot Studio), AI-enhanced IDEs (e.g. Zed editor or Sourcegraph Cody), or other AI agents​. The host is where the user interacts with the AI model. It initiates connections to one or more MCP servers to extend the model’s capabilities.\nMCP Client: The client is a connector library running inside the host application that manages the 1:1 connection with an MCP server​.\nIt speaks the MCP protocol, sending requests to servers and receiving responses. In practice, the MCP client is usually provided by an SDK (for example, Anthropic’s Claude Desktop includes an MCP client, and Microsoft’s Copilot Studio uses a connector) that knows how to communicate with any MCP-compliant server.\nMCP Server: The server is a lightweight program or service that exposes specific data or capabilities through the standardized MCP interface​. An MCP server acts as a bridge between the AI agent and a data source or tool – it could interface with local files, a database, an external API, a code repository, etc., and present those as “resources” or “actions” to the AI​. Developers either run existing MCP servers or build new ones to connect their particular data systems to AI models.\nHow these components interact A host (with an AI model) can connect to multiple MCP servers at once, giving the model access to a variety of tools and context. For example, a coding assistant might connect to a GitHub server (to retrieve code or file history), a database server (to fetch query results), and a filesystem server (to read local project files).\nEach server communicates with the host via the MCP client protocol, using a common messaging format. The host can then allow the AI model to retrieve information or invoke operations on those servers in a controlled manner (often requiring user approval before executing potentially sensitive actions).\nCommon use cases MCP’s flexibility means it can be applied anywhere an AI needs external knowledge or the ability to act. Some scenarios enabled by MCP include:\nCoding assistants and IDEs: An AI coding assistant can use MCP to access your project’s codebase, version control, and documentation. For instance, an MCP Git server can let the AI list repositories, read code files, or perform Git operations; a GitHub MCP server can interface with GitHub’s API for issues and pull requests​. This is already being used in developer tools like Claude’s IDE integration, Replit, and Sourcegraph to provide context-specific code suggestions and even make code changes with user oversight\nData analysis and databases: With an MCP database server (e.g. a PostgreSQL or SQLite connector), an AI agent could run read-only queries on a company database and retrieve results​. This allows the model to answer analytical questions or generate reports based on live data, without exposing the entire database directly. The MCP server enforces access controls and schema limitations​.\nDocument retrieval and knowledge bases: MCP servers exist for tools like Google Drive, Notion, or local filesystems, enabling an AI assistant to search documents and read their contents when answering a question​\nThe server might expose documents as resources (read-only content) that the AI can request. Similarly, a “Memory” MCP server can provide a long-term knowledge store or vector search (e.g. using a tool like Qdrant or Weaviate for semantic memory)​.\nDevOps and cloud automation: Some community-built MCP servers connect to infrastructure tools – for example, a Docker MCP server to manage container tasks, or a Kubernetes server to control clusters.\nAn AI agent with access to these could assist in deployment or monitoring tasks (again, only with explicit user permission on each action). Official integrations by companies like Cloudflare and Stripe indicate MCP servers can also manage cloud resources or payment workflows securely​\nProductivity and communication: There are MCP connectors for Slack (to read channel messages or post updates)​, Todoist (to manage tasks), and even calendar or maps services​.\nOverall, MCP fosters an ecosystem of interchangeable integrations. A tool exposed via an MCP server can be used by any AI application that supports MCP, and vice versa – an AI app can tap into a growing library of pre-built MCP servers. This means less reinventing the wheel: instead of writing a custom integration for each new AI agent or platform, a developer can build one MCP server and have it work across many AI systems.\nTechnical Specifications and Architecture of MCP At its core, MCP is built on JSON-RPC 2.0 as the message format for communication​\nTransport layer\nThe transport layer handles the actual communication between clients and servers. MCP supports multiple transport mechanisms:\nStdio transport Uses standard input/output for communication Ideal for local processes HTTP with SSE transport Uses Server-Sent Events for server-to-client messages HTTP POST for client-to-server messages SSE transport is useful when the server can’t easily run as a child process of the host – for example, if it’s a remote web service or needs to serve multiple clients. SSE provides unidirectional streaming from server to client (so the server can stream results or real-time updates), while requests from the client come via standard HTTP posts.\nCapabilities – resources and tools An MCP server can provide two broad kinds of things to an AI: contextual data (called Resources) and actions (called Tools). These are collectively referred to as the server’s capabilities in the MCP specification. When the server starts, it usually identifies which capabilities it supports so the client knows what it can do (for example, a pure read-only data server might expose resources but no tools, or vice versa).\nResources are pieces of data or content that the server exposes for reading.\nThink of resources as the knowledge or context the server makes available. This could be the contents of a file, a row from a database, the text of an email, an image file, etc. Each resource is identified by a URI (e.g. file:///path/to/file.txt or postgres://server/db/schema/table) and has some data associated with it​\nClients can request resources from the server – typically via a method like resources/read or by referencing the URI – and the server will return the content (text or binary).\nResources are meant to be determined by the client or user: for example, a user might choose which documents an AI assistant should load as context. Some MCP clients require explicit user selection of resources (to maintain control)​. Because resources are passive (just data), if you want the AI to automatically leverage some context without user selection, you might instead package it as a tool that the AI can actively call.\nTools are operations or functions that the server can perform on behalf of the AI – essentially _executable actions exposed via the protocol_​\nTools let the AI do things in the external system: run a query, send a message, create a file, etc. Each tool has a name, a human-readable description, and an input schema defining what parameters it accepts​.\nThe input schema is expressed in JSON Schema, allowing the host (and even the AI model) to understand what arguments are required. For example, a database query tool might have a schema with a \u0026quot;query\u0026quot; string parameter, or a calendar-scheduling tool might require a \u0026quot;date\u0026quot; and \u0026quot;title\u0026quot;. Tools are discovered by the client via a standardized tools/list request – the server responds with the list of available tool names, descriptions, and input schemas.\nTo invoke a tool, the client sends a tools/call request with the tool name and a JSON object of arguments, then the server executes the action and returns the result (or an error)​\nResults can be simple confirmations, or data produced by the action (for instance, a tool that reads a file might return the file content as a result, which blurs the line with “resource” – indeed the concepts overlap; tools can return resource data, but the key is tools are initiated by the AI). Because tools can have side effects or reach sensitive data, MCP is designed for human oversight: the AI model may suggest using a tool, but typically the client (or the user in the loop) must approve it before it’s executed​.\nPrompts and others: In addition to raw data and actions, MCP also defines other capabilities like Prompts (reusable prompt templates or instructions that servers can provide) and advanced flows like Sampling (which allows a server to ask the client’s AI model to generate a completion, effectively letting the server “consult” the AI). For example, a server could have a built-in prompt for summarizing a document: the host can request that prompt and fill it with data.\nSecurity and scope MCP is designed with security in mind, acknowledging that giving an AI access to external systems can be risky. The protocol encourages a few best practices to mitigate issues:\nRoots (Context Boundaries): When a host connects to a server, it can specify “roots” – essentially the scope or boundaries within which the server should operate​. For example, a filesystem server might be given a root of \u0026lt;code\u0026gt;\u0026lt;em\u0026gt;\u0026lt;span style=\u0026quot;text-decoration: underline;\u0026quot;\u0026gt;file:///home/user/project\u0026lt;/span\u0026gt;\u0026lt;/em\u0026gt; to indicate it should only expose files within that project directory.\nOr an API-based server might be rooted to a particular endpoint or tenant. Roots help the server know what is “in bounds” and keep context focused. They are not an absolute security sandbox (the server could ignore them, but well-behaved servers honor the roots) – they serve as guidance from the client about relevant resources and as an extra layer of clarity on allowed operations​\nHuman-in-the-loop approvals: As mentioned, tool invocation usually requires user approval in practice. MCP’s role is to present the action and its parameters in a structured way (via the tool schema), so the host UI can ask the user for confirmation. Only once approved does the server actually execute the action. This ensures the AI can’t, say, delete files or send emails unless the user explicitly okays it.\nAccess controls in servers: The MCP servers themselves often enforce their own security. For example, an MCP server for GitHub will require an access token to be provided (via environment variable or config) to authenticate to the GitHub API​\nA filesystem server might run with the permissions of the local user and have an allow-list of directories it will serve. By centralizing these rules in the server implementation, organizations can tightly control what an AI agent can do.\nTools, Frameworks, and Libraries for Building MCP Servers One of the advantages of MCP being an open standard is that there are already official SDKs and open-source libraries to help you implement it. You don’t have to write the JSON-RPC handling or protocol logic from scratch – you can use a provided framework in the language of your choice.\nAvailable SDKs and languages: The MCP project (led by Anthropic and community contributors) provides SDKs in multiple languages, including TypeScript/JavaScript, Python, Java, Kotlin, and C#.\nThese SDKs offer base classes for an MCP Server and Client, utilities for defining tools/resources, and support for the standard transports. In practice, most developers use TypeScript (Node.js) or Python for MCP server development, as these were the first and most fully featured SDKs (many of the reference servers are implemented in one of these).\nTypeScript/Node: Using the TypeScript SDK, you can build an MCP server as a Node.js application. The SDK integrates well with frameworks like Express if you’re using the SSE transport (for example, setting up an Express app with an /events endpoint for SSE and a POST endpoint for incoming messages​\nAdditional tools for development: During development of an MCP server, you might find the following useful:\nMCP Inspector: an interactive debugging UI for MCP, which can connect to your server and let you test requests and see messages in real time Step-by-Step: Creating an MCP Server Initialize the MCP server instance. In your code, import the SDK and create a server object. Typically, this involves specifying the server’s name, version, and what capabilities you plan to use. For example, in TypeScript:\nimport { Server, ListToolsRequestSchema, CallToolRequestSchema } from \u0026#34;@modelcontextprotocol/typescript-sdk\u0026#34;; const server = new Server( { name: \u0026#34;example-server\u0026#34;, version: \u0026#34;1.0.0\u0026#34; }, { capabilities: { tools: {} } } // we indicate this server will provide tools ); Define the server’s functionality – resources or tools. Now, use the SDK to register what your server can do. For resources, you might register handlers for resource queries or prepare a list of resource URIs. For tools, you register tool definitions and the functions to execute them. In our add-two-numbers example, we will create a tool called \u0026quot;calculate_sum\u0026quot;.\n// Define the tool list handler server.setRequestHandler(ListToolsRequestSchema, async () =\u0026gt; { return { tools: [{ name: \u0026#34;calculate_sum\u0026#34;, description: \u0026#34;Add two numbers together\u0026#34;, inputSchema: { type: \u0026#34;object\u0026#34;, properties: { a: { type: \u0026#34;number\u0026#34; }, b: { type: \u0026#34;number\u0026#34; } }, required: [\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;] } }] }; }); // Define the tool execution handler server.setRequestHandler(CallToolRequestSchema, async (request) =\u0026gt; { if (request.params.name === \u0026#34;calculate_sum\u0026#34;) { const { a, b } = request.params.arguments; return { content: [ { type: \u0026#34;text\u0026#34;, text: String(a + b) } ] }; } throw new Error(\u0026#34;Tool not found\u0026#34;); }); Choose and set up the transport (server runtime). Decide how your server will be run and communicate:\nFor local usage (stdio): If you expect the host to launch your server as a subprocess, you don’t need much additional code – typically, you just call something like server.connect(new StdioServerTransport()) to start listening on studio. For network usage (SSE): You need to host an HTTP endpoint. In a Node/Express setup, you would create an /sse GET endpoint that, when hit by a client, creates an SSEServerTransport and calls server.connect(transport) to tie the server to that client connection​. Also set up a POST endpoint (e.g. /messages) that the client will use to send JSON-RPC requests – this endpoint just passes those requests to the transport handler. The key point is that with SSE, you are effectively implementing a tiny protocol bridge: one route to establish the stream (and internally call server.connect when a client connects) and one route to accept incoming JSON messages. Using an AI host (e.g. Claude Desktop or Copilot): Configure the host to use your server. For Claude Desktop, add your server to the mcpServers section of Claude’s config. For example, if your server is a standalone executable my-server accessible in PATH, you might add:\n\u0026#34;mcpServers\u0026#34;: { \u0026#34;example\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;my-server\u0026#34; } } You could then ask the AI, “Use the calculate_sum tool to add 5 and 7,” and if all goes well, it will invoke the tool and return the answer. (In practice, with such a simple tool, the AI might just do it in its head, but for testing you can force it by phrasing the request to ensure tool use.)\nDeploy or integrate the server as needed Once you are satisfied with your MCP server, consider how it will run in your target scenario. If it’s for personal use (e.g. connecting a local IDE to your stuff), you might just run it on demand. If it’s for a production agent or a team, you might deploy the server as a service (maybe a Docker container running your MCP server code on some internal host or cloud function). Because MCP is a standard, multiple hosts could use your server if they know its address or launch command. For example, a company could deploy a database MCP server on an internal network; both a Claude instance and a Copilot instance could connect to it to allow their AI agents to safely query the database. Always ensure any required credentials (API keys, etc.) are securely provided to the server (via environment variables or a vault) and that you’ve restricted its access (using the roots mechanism or internal checks) to only the intended scope.\nConclusion and Resources The Model Context Protocol is a powerful development: by standardizing how AI models interact with external data and tools, it enables a plug-and-play approach to augmenting AI capabilities. We’ve covered what MCP is, how it works, and how to build an MCP server. From required libraries (e.g. the official SDKs in TypeScript/Python and others) to the architecture of client–server communication (JSON-RPC over stdio or SSE), MCP provides a clear framework for integration. Building an MCP server involves defining what data or actions you want to expose, implementing a few handler functions, and letting the protocol layer handle the rest. With the growing library of open-source MCP servers for everything from databases to DevOps, you can also study and reuse existing implementations\nResources:\nhttps://modelcontextprotocol.io/introduction\n","permalink":"https://learncodecamp.net/model-context-protocol/","summary":"\u003ch3 id=\"introduction-to-mcp\"\u003eIntroduction to MCP\u003c/h3\u003e\n\u003cp\u003eThe \u003cstrong\u003eModel Context Protocol (MCP)\u003c/strong\u003e is an open standard for connecting AI assistants (like large language models) to the systems where data and tools live​\u003c/p\u003e\n\u003cp\u003eIn essence, MCP aims to bridge the gap between isolated AI models and real-world data sources – think of it as a \u003cstrong\u003e“USB-C for AI applications”\u003c/strong\u003e, providing a universal way to plug an AI model into various databases, file systems, APIs, and other tools​.\u003c/p\u003e","title":"Model Context Protocol (MCP) – A Technical Guide to Understanding and Building MCP Servers"},{"content":"What is Signaling in WebRTC, and Why is it Needed? WebRTC allows direct peer-to-peer (P2P) communication, but before two peers can connect, they need to exchange network and media information. This process is called signaling.\nSignaling is needed for:\nExchanging session descriptions (SDP – Session Description Protocol) between peers. Sharing ICE candidates to determine the best network path. Handling NAT traversal by identifying public and private network addresses. Establishing data channels for text, files, and other non-media communication. WebRTC does not define a signaling protocol. Developers must implement their own using available technologies like WebSockets, WebRTC Data Channels, or existing protocols like SIP. Different Ways to Implement Signaling Since WebRTC does not provide a built-in signaling mechanism, developers must choose how to implement it.\n1. WebSockets (Most Common) WebSockets provide a bidirectional communication channel over TCP, making them ideal for real-time signaling.\nPros: Persistent connection with low latency. Works in browsers and mobile apps. Lightweight and easy to integrate. Cons: Requires a WebSocket server. Less efficient for large-scale deployments. Example using WebSockets for signaling:\nconst socket = new WebSocket(\u0026#34;wss://your-signaling-server.com\u0026#34;); socket.onmessage = (message) =\u0026gt; { const data = JSON.parse(message.data); if (data.sdp) { peerConnection.setRemoteDescription(new RTCSessionDescription(data.sdp)); } }; function sendSignal(data) { socket.send(JSON.stringify(data)); } 2. Socket.io (WebSockets + Extra Features) Socket.io is a JavaScript library that enhances WebSockets by adding:\nAutomatic reconnection Broadcasting capabilities Support for older browsers (fallback to HTTP long polling) Example using Socket.io for signaling:\nconst socket = io(\u0026#34;https://your-signaling-server.com\u0026#34;); socket.on(\u0026#34;offer\u0026#34;, (data) =\u0026gt; { peerConnection.setRemoteDescription(new RTCSessionDescription(data.sdp)); }); function sendSignal(event, data) { socket.emit(event, data); } 3. SIP (Session Initiation Protocol) SIP is a telephony-based protocol used in VoIP applications. It handles session initiation and management.\nPros: Standardized for VoIP. Works well with enterprise telephony systems. Cons: More complex to implement than WebSockets. Requires additional infrastructure. WebRTC Connection Flow WebRTC follows a step-by-step Offer/Answer model to establish a peer-to-peer connection.\n1. Offer/Answer Model (SDP – Session Description Protocol) One peer creates an offer containing supported media formats (SDP). The other peer responds with an answer accepting or modifying the offer. SDP includes details like codec support, resolution, and encryption. Example SDP Offer:\n{ \u0026#34;type\u0026#34;: \u0026#34;offer\u0026#34;, \u0026#34;sdp\u0026#34;: \u0026#34;v=0\\r\\no=- 46117397 2 IN IP4 127.0.0.1...\u0026#34; } Code Example:\npeerConnection.createOffer().then(offer =\u0026gt; { peerConnection.setLocalDescription(offer); sendSignal({ sdp: offer }); }); 2. ICE Candidate Gathering WebRTC uses ICE (Interactive Connectivity Establishment) to find the best network path. The browser collects ICE candidates (possible IP addresses) and shares them via the signaling server. The receiving peer adds these candidates to its connection attempt. Example:\npeerConnection.onicecandidate = event =\u0026gt; { if (event.candidate) { sendSignal({ candidate: event.candidate }); } }; 3. NAT Traversal (STUN/TURN Servers) Most devices are behind NAT (Network Address Translation), preventing direct connections. STUN (Session Traversal Utilities for NAT) helps discover public IP addresses. If STUN fails, TURN (Traversal Using Relays around NAT) relays traffic through a server. Configuring STUN/TURN in WebRTC:\nconst config = { iceServers: [ { urls: \u0026#34;stun:stun.l.google.com:19302\u0026#34; }, { urls: \u0026#34;turn:your-turn-server.com\u0026#34;, username: \u0026#34;user\u0026#34;, credential: \u0026#34;pass\u0026#34; } ] }; const peerConnection = new RTCPeerConnection(config); 4. Establishing a Secure Peer-to-Peer Connection WebRTC uses DTLS (Datagram Transport Layer Security) to encrypt signaling and data transmission. SRTP (Secure Real-time Transport Protocol) encrypts media streams. Once ICE candidates are exchanged and a secure path is found, the peers connect directly. Finalizing the connection:\npeerConnection.oniceconnectionstatechange = () =\u0026gt; { console.log(\u0026#34;ICE Connection State:\u0026#34;, peerConnection.iceConnectionState); }; WebRTC Connection Flow Summary Step Description Signaling Uses WebSockets, Socket.io, or SIP to exchange SDP and ICE candidates. Offer/Answer Model One peer creates an SDP offer, the other responds with an SDP answer. ICE Candidate Gathering Peers exchange possible network addresses (candidates). NAT Traversal STUN/TURN servers help peers connect if behind firewalls. Secure Connection DTLS and SRTP encrypt the data and media streams. Conclusion Signaling is an essential step in WebRTC, enabling peers to exchange connection details. Various methods like WebSockets, Socket.io, and SIP can be used for signaling. Once signaling completes, the WebRTC connection flow follows SDP negotiation, ICE candidate exchange, NAT traversal, and secure data transmission.\n","permalink":"https://learncodecamp.net/webrtc-signalling/","summary":"\u003ch3 id=\"what-is-signaling-in-webrtc-and-why-is-it-needed\"\u003e\u003cstrong\u003eWhat is Signaling in WebRTC, and Why is it Needed?\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eWebRTC allows direct peer-to-peer (P2P) communication, but before two peers can connect, they need to exchange network and media information. This process is called \u003cstrong\u003esignaling\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSignaling is needed for:\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eExchanging session descriptions\u003c/strong\u003e (SDP – Session Description Protocol) between peers.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSharing ICE candidates\u003c/strong\u003e to determine the best network path.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eHandling NAT traversal\u003c/strong\u003e by identifying public and private network addresses.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eEstablishing data channels\u003c/strong\u003e for text, files, and other non-media communication.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cblockquote\u003e\n  \u003cp\u003e\n    WebRTC does \u003cstrong\u003enot define\u003c/strong\u003e a signaling protocol. Developers must implement their own using available technologies like WebSockets, WebRTC Data Channels, or existing protocols like SIP.\n  \u003c/p\u003e","title":"WebRTC Signaling \u0026 Connection Establishment"},{"content":"WebRTC (Web Real-Time Communication) is an open-source project that enables real-time communication between browsers and devices using peer-to-peer connections. It allows audio, video, and data sharing without requiring additional plugins or external software.\nHow Does WebRTC Work? WebRTC works by establishing a direct connection between two peers (browsers or applications) to transmit media (audio/video) and data. The connection process involves several steps:\nMedia Capture – A user’s camera and microphone are accessed. Signaling – Exchanging connection details between peers via a signaling server (e.g., using WebSockets). ICE Candidate Discovery – Finding the best network path between peers. Connection Establishment – Securely connecting the peers. Data Transmission – Streaming audio/video or sending arbitrary data. 2. Main Components of WebRTC WebRTC consists of three primary components:\na) MediaStream (getUserMedia) – Capturing Audio/Video The MediaStream API allows access to the user’s camera and microphone. getUserMedia() prompts the user for permission and returns a media stream. Example Code:\nnavigator.mediaDevices.getUserMedia({ video: true, audio: true }) .then(stream =\u0026gt; { document.getElementById(\u0026#34;videoElement\u0026#34;).srcObject = stream; }) .catch(error =\u0026gt; { console.error(\u0026#34;Error accessing media devices.\u0026#34;, error); }); b) RTCPeerConnection – Handling Peer-to-Peer Connections This API establishes and maintains a direct peer-to-peer connection. It manages network traversal, encoding/decoding, and bandwidth control. Example Code:\nconst peerConnection = new RTCPeerConnection(); peerConnection.addStream(localStream); peerConnection.ontrack = event =\u0026gt; { remoteVideo.srcObject = event.streams[0]; }; c) RTCDataChannel – Sending Arbitrary Data Allows direct data exchange between peers (e.g., text messages, files). Uses a reliable and low-latency protocol similar to WebSockets. Example Code:\nconst dataChannel = peerConnection.createDataChannel(\u0026#34;chat\u0026#34;); dataChannel.onopen = () =\u0026gt; console.log(\u0026#34;Data channel is open\u0026#34;); dataChannel.onmessage = event =\u0026gt; console.log(\u0026#34;Received:\u0026#34;, event.data); 3. WebRTC Protocols WebRTC relies on multiple protocols to enable secure and efficient communication:\na) ICE (Interactive Connectivity Establishment) A framework for discovering and establishing peer-to-peer network paths. Works by gathering multiple candidate network addresses and selecting the best one. b) STUN (Session Traversal Utilities for NAT) Helps devices behind NAT (Network Address Translation) discover their public IP. Enables direct communication between peers without using a relay server. c) TURN (Traversal Using Relays around NAT) If direct communication fails (due to strict NAT or firewalls), TURN relays media through a server. More resource-intensive compared to STUN. d) DTLS (Datagram Transport Layer Security) Encrypts all WebRTC communication for security. Provides authentication and prevents tampering. e) SRTP (Secure Real-time Transport Protocol) Encrypts and ensures the integrity of media streams (audio/video). Works alongside DTLS to provide end-to-end encryption. Summary Component Purpose MediaStream (getUserMedia) Captures media (audio/video) from the user. RTCPeerConnection Manages peer-to-peer connections. RTCDataChannel Sends arbitrary data between peers. ICE Finds the best connection path. STUN Discovers public IP addresses behind NAT. TURN Relays media if direct connection fails. DTLS Secures data transmission. SRTP Encrypts media streams. ","permalink":"https://learncodecamp.net/webrtc-fundamentals/","summary":"\u003cp\u003eWebRTC (Web Real-Time Communication) is an open-source project that enables real-time communication between browsers and devices using peer-to-peer connections. It allows audio, video, and data sharing without requiring additional plugins or external software.\u003c/p\u003e\n\u003ch4 id=\"how-does-webrtc-work\"\u003e\u003cstrong\u003eHow Does WebRTC Work?\u003c/strong\u003e\u003c/h4\u003e\n\u003cp\u003eWebRTC works by establishing a direct connection between two peers (browsers or applications) to transmit media (audio/video) and data. The connection process involves several steps:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eMedia Capture\u003c/strong\u003e – A user’s camera and microphone are accessed.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSignaling\u003c/strong\u003e – Exchanging connection details between peers via a signaling server (e.g., using WebSockets).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eICE Candidate Discovery\u003c/strong\u003e – Finding the best network path between peers.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eConnection Establishment\u003c/strong\u003e – Securely connecting the peers.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eData Transmission\u003c/strong\u003e – Streaming audio/video or sending arbitrary data.\u003c/li\u003e\n\u003c/ol\u003e\n\u003chr /\u003e\n\u003ch3 id=\"2-main-components-of-webrtc\"\u003e\u003cstrong\u003e2. Main Components of WebRTC\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eWebRTC consists of three primary components:\u003c/p\u003e","title":"WebRTC Fundamentals"},{"content":"The recent release of DeepSeek R1 has generated significant buzz in the AI community. While much of the discussion has centered on its performance relative to models like OpenAI’s GPT-4 and Anthropic’s Claude, the real breakthrough lies in the underlying algorithmic innovations that make DeepSeek R1 both highly efficient and cost-effective. This post explores the key technical advancements that power DeepSeek’s latest model.\nModel Architecture and Training DeepSeek R1 is part of a broader model ecosystem, and it’s essential to distinguish between two key models:\nDeepSeek V3: A general-purpose base model, released in December, comparable to GPT-4o and Gemini 1.5. DeepSeek R1: A reasoning-optimized model, built on top of V3, specifically designed for complex problem-solving. While R1 does not introduce a radically new architecture, it employs a strategic combination of advanced techniques to enhance reasoning efficiency. Many of these innovations were previously discussed in DeepSeek’s research papers and are now refined for production use.\nKey Algorithmic Innovations DeepSeek’s efficiency-first approach relies on several groundbreaking techniques:\n1. FP8 Training for Memory Efficiency Unlike most large-scale models that use 16-bit or 32-bit floating-point formats, DeepSeek V3 is trained natively in 8-bit floating-point (FP8). This significantly reduces memory requirements without compromising performance. Given the restrictions on high-end GPUs and export controls to China, this optimization is particularly crucial.\n2. FP8 Accumulation Fix A challenge with FP8 training is potential numerical instability. To counter this, DeepSeek implements a method that periodically merges calculations back into a higher-precision FP32 accumulator, ensuring stable and accurate computations while maintaining efficiency.\n3. Mixture of Experts (MoE) Architecture DeepSeek V3 adopts a Mixture of Experts (MoE) design:\nWhile the model has 671 billion parameters, only 37 billion are activated per token prediction. In contrast, models like Llama 3 (405 billion parameters) activate their entire parameter set per forward pass, leading to higher compute costs. This selective activation approach drastically reduces computational expense while retaining model effectiveness. 4. Multi-Head Latent Attention (MLA) A key efficiency bottleneck in large models is KV cache memory usage. DeepSeek V3 employs Multi-Head Latent Attention (MLA) to address this:\nKey and value matrices are compressed into a latent space and only reconstructed when needed. In their earlier V2 model, this led to a 93.3% reduction in KV cache size and a 5.76x improvement in throughput. The efficiency gains in V3 further improve inference speed, making deployment more scalable. 5. Multi-Token Prediction (MTP) Traditional language models predict one token at a time, but DeepSeek V3 employs Multi-Token Prediction (MTP):\nThe model predicts multiple future tokens per step, increasing training signal density. This leads to smoother, more coherent outputs and enables faster inference. MTP modules can also be adapted for speculative decoding, speeding up response times significantly. Reasoning Model Training: The R1 Advantage DeepSeek R1 is tailored for complex, step-by-step problem-solving, setting it apart from general-purpose LLMs. Key innovations include:\n1. Reinforcement Learning (RL) for Logical Thinking DeepSeek trains R1 using reinforcement learning (RL) on problems with verifiable answers (e.g., math and coding tasks). Unlike traditional fine-tuning, this method helps the model learn reasoning strategies autonomously, rather than just mimicking examples. 2. Simple Rule-Based Evaluation Instead of complex AI-based reward models, DeepSeek evaluates model responses with simple rule-based grading systems. This reduces training overhead while maintaining high-quality outputs. 3. Group Relative Policy Optimization (GRPO) GRPO is a novel RL technique that updates the model based on relative rankings of outputs, leading to the emergence of structured reasoning skills like extended Chain of Thought (CoT). Addressing Early Issues Initially, R1 suffered from poor readability and inconsistent language switching between English and Chinese. To resolve this, DeepSeek implemented a cold-start fine-tuning phase, using structured reasoning examples before RL training.\nPerformance and Accessibility DeepSeek R1 demonstrates strong performance, rivaling OpenAI’s models on math and coding benchmarks. Beyond accuracy, its cost-effectiveness and accessibility set it apart:\nOpen Source: R1 is freely available on DeepSeek’s website and app. Customizable: The model can be downloaded, run locally, and fine-tuned for specialized use cases. DeepSeek’s emphasis on algorithmic efficiency makes R1 a compelling option for researchers and developers with limited access to high-end compute resources.\nReplicability: Can Others Follow Suit? One of the most intriguing claims about DeepSeek V3 is its low training cost—just $5.5 million for the final training run. However, this figure excludes R\u0026amp;D and hardware costs, which likely amount to hundreds of millions.\nDespite this, the underlying techniques are replicable:\nA UC Berkeley lab successfully applied similar optimizations to a smaller-scale model, achieving advanced reasoning capabilities for just $30 in compute costs. This suggests that efficiency-focused AI development can democratize access to cutting-edge capabilities. Conclusion DeepSeek R1 is a testament to the untapped potential of efficiency-driven AI innovation. By leveraging:\nFP8 training for memory savings Mixture of Experts (MoE) for selective computation Multi-Head Latent Attention (MLA) for KV cache reduction Reinforcement learning (RL) and GRPO for structured reasoning DeepSeek has proven that state-of-the-art performance can be achieved at a fraction of the typical cost.\nAs AI development moves forward, models like R1 will likely shape the future of scalable and accessible reasoning-based AI. Whether through open-source contributions or enterprise applications, the DeepSeek approach is poised to drive the next wave of AI advancements.\nPaper: https://arxiv.org/abs/2501.12948\n","permalink":"https://learncodecamp.net/deepseek-r1/","summary":"\u003cp\u003eThe recent release of \u003cstrong\u003eDeepSeek R1\u003c/strong\u003e has generated significant buzz in the AI community. While much of the discussion has centered on its performance relative to models like OpenAI’s GPT-4 and Anthropic’s Claude, the real breakthrough lies in the underlying algorithmic innovations that make DeepSeek R1 both highly efficient and cost-effective. This post explores the key technical advancements that power DeepSeek’s latest model.\u003c/p\u003e\n\u003chr /\u003e\n\u003ch3 id=\"model-architecture-and-training\"\u003e\u003cstrong\u003eModel Architecture and Training\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eDeepSeek R1 is part of a broader model ecosystem, and it’s essential to distinguish between two key models:\u003c/p\u003e","title":"DeepSeek R1: A Deep Dive into Algorithmic Innovations"},{"content":"What is Supervised Fine-Tuning (SFT)? Supervised fine-tuning is a training strategy where a pre-trained language model is further refined on a carefully curated dataset of prompt-response pairs. The primary goal is to “teach” the model how to generate appropriate, contextually relevant, and human-aligned responses.\nKey points about SFT include:\nData Curation: The model is exposed to a dataset that contains high-quality examples—often created by human annotators—that demonstrate the desired behavior (e.g., step-by-step reasoning, correct coding outputs, or helpful dialogue responses). Instruction Following: By training on these examples, the model learns to interpret prompts as instructions and produce answers that mimic the reasoning and style of the training data. Limitations: While SFT works well to instill basic response quality, it is typically limited by the dataset’s scope and may not encourage the model to “think” beyond what is explicitly provided. Furthermore, excessive fine-tuning can lead to overfitting and reduce the model’s ability to generalize to unseen tasks. For many contemporary language models, SFT is the standard method used to bridge the gap between raw pre-training and interactive, user-facing performance.\nWe will be checking out TRL – Transformer Reinforcement Learning library for learning SFT\nLet’s try Fine-Tuning SmolLM2\nhttps://huggingface.co/HuggingFaceTB/SmolLM2-135M\nSmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device.\nLet’s try with a small base model, this is not instruct model, this only predicts the next word.\nWe will use this dataset for SFT : https://huggingface.co/datasets/HuggingFaceTB/smoltalk\n# Import necessary libraries from transformers import AutoModelForCausalLM, AutoTokenizer from datasets import load_dataset from trl import SFTConfig, SFTTrainer, setup_chat_format import torch device = ( \u0026#34;cuda\u0026#34; if torch.cuda.is_available() else \u0026#34;mps\u0026#34; if torch.backends.mps.is_available() else \u0026#34;cpu\u0026#34; ) # Load the model and tokenizer model_name = \u0026#34;HuggingFaceTB/SmolLM2-135M\u0026#34; model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path=model_name ).to(device) tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) # Set up the chat format model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) # Set our name for the finetune to be saved \u0026amp;/ uploaded to finetune_name = \u0026#34;SmolLM2-FT-MyDataset\u0026#34; finetune_tags = [\u0026#34;smol-course\u0026#34;, \u0026#34;module_1\u0026#34;] Model Loading:\nThe model SmolLM2-135M is loaded from the Hugging Face Hub and moved to the selected device (GPU/CPU).\nTokenizer Loading:\nThe tokenizer associated with the model is also loaded.\nChat Format Setup:\nThe setup_chat_format function modifies both the model and tokenizer so that they support chat-style interactions. This typically involves configuring special tokens (such as \u0026lt;|im_start|\u0026gt; and \u0026lt;|im_end|\u0026gt;) to mark the beginning and end of messages.\n# Let\u0026#39;s test the base model before training prompt = \u0026#34;Write a haiku about programming\u0026#34; # Format with template messages = [{\u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: prompt}] formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) # Generate response inputs = tokenizer(formatted_prompt, return_tensors=\u0026#34;pt\u0026#34;).to(device) outputs = model.generate(**inputs, max_new_tokens=100) print(\u0026#34;Before training:\u0026#34;) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Output it produces is garbage, because the base model is not properly formatted to handle chat-style prompts., If our prompt is like text completions, it will do a decent job. Let’s try with a text completion style prompt.\nprompt = \u0026#34;Write a haiku about programming. Code is \u0026#34; inputs = tokenizer(prompt, return_tensors=\u0026#34;pt\u0026#34;).to(device) outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Output : Write a haiku about programming. Code is 100% free. What is a haiku? A haiku is a Japanese poem that consists of only three lines. The first line is called the “zen” line, and the second and third lines are called the Dataset Preparation We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\nTRL will format input messages based on the model’s chat templates. They need to be represented as a list of dictionaries with the keys: role and content\nhttps://huggingface.co/datasets/HuggingFaceTB/smoltalk\nThis dataset is already in the required form. For example\n[ { \u0026#34;content\u0026#34;: \u0026#34;How many positive integers with four digits have a thousands digit of 2?\u0026#34;, \u0026#34;role\u0026#34;: \u0026#34;user\u0026#34; }, { \u0026#34;content\u0026#34;: \u0026#34;Since the thousands digit must be 2, we have only one choice for that digit.\\nFor the hundreds digit, we have 10 choices (0-9).\\nFor the tens and units digits, we also have 10 choices each.\\nTherefore, there are $1 \\\\times 10 \\\\times 10 \\\\times 10 = \\\\boxed{1000}$ positive integers with four digits that have a thousands digit of 2.\\nThe answer is: 1000\u0026#34;, \u0026#34;role\u0026#34;: \u0026#34;assistant\u0026#34; } ] Configuring the SFTTrainer The SFTTrainer is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.\n# Configure the SFTTrainer sft_config = SFTConfig( output_dir=\u0026#34;./sft_output\u0026#34;, max_steps=1000, # Adjust based on dataset size and desired training duration per_device_train_batch_size=4, # Set according to your GPU memory capacity learning_rate=5e-5, # Common starting point for fine-tuning logging_steps=10, # Frequency of logging training metrics save_steps=100, # Frequency of saving model checkpoints evaluation_strategy=\u0026#34;steps\u0026#34;, # Evaluate the model at regular intervals eval_steps=50, # Frequency of evaluation use_mps_device=( True if device == \u0026#34;mps\u0026#34; else False ), # Use MPS for mixed precision training hub_model_id=finetune_name, # Set a unique name for your model ) # Initialize the SFTTrainer trainer = SFTTrainer( model=model, args=sft_config, train_dataset=ds[\u0026#34;train\u0026#34;], tokenizer=tokenizer, eval_dataset=ds[\u0026#34;test\u0026#34;], ) Training the Model With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model’s parameters to minimize this loss.\n# Train the model trainer.train() # Save the model trainer.save_model(f\u0026#34;./{finetune_name}\u0026#34;) Test the fine-tuned model on the same prompt\n# Let\u0026#39;s test the base model before training prompt = \u0026#34;Write a haiku about programming\u0026#34; # Format with template messages = [{\u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: prompt}] formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) # Generate response inputs = tokenizer(formatted_prompt, return_tensors=\u0026#34;pt\u0026#34;).to(device) You can also push the trained model to huggingface\ntrainer.push_to_hub(tags=finetune_tags) Observations\nGrad norm stabilizes, indicating well-behaved gradients.\nThe loss trend suggests that the model is improving.\nFor the complete code check this : https://github.com/nkalra0123/sft/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb\n","permalink":"https://learncodecamp.net/supervised-fine-tuning-sft/","summary":"\u003ch2 id=\"what-is-supervised-fine-tuning-sft\"\u003eWhat is Supervised Fine-Tuning (SFT)?\u003c/h2\u003e\n\u003cp\u003eSupervised fine-tuning is a training strategy where a pre-trained language model is further refined on a carefully curated dataset of prompt-response pairs. The primary goal is to “teach” the model how to generate appropriate, contextually relevant, and human-aligned responses.\u003c/p\u003e\n\u003cp\u003eKey points about SFT include:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eData Curation:\u003c/strong\u003e The model is exposed to a dataset that contains high-quality examples—often created by human annotators—that demonstrate the desired behavior (e.g., step-by-step reasoning, correct coding outputs, or helpful dialogue responses).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eInstruction Following:\u003c/strong\u003e By training on these examples, the model learns to interpret prompts as instructions and produce answers that mimic the reasoning and style of the training data.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eLimitations:\u003c/strong\u003e While SFT works well to instill basic response quality, it is typically limited by the dataset’s scope and may not encourage the model to “think” beyond what is explicitly provided. Furthermore, excessive fine-tuning can lead to overfitting and reduce the model’s ability to generalize to unseen tasks.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eFor many contemporary language models, SFT is the standard method used to bridge the gap between raw pre-training and interactive, user-facing performance.\u003c/p\u003e","title":"Supervised Fine-Tuning (SFT)"},{"content":"This sound is generated with Kokoro tts The world of text-to-speech (TTS) has seen incredible advancements, but often these powerful models require hefty hardware like GPUs. But what if you could run a top-tier TTS model locally on your CPU? Enter **Kokoro**, a game-changing TTS model that delivers impressive results even on resource-constrained devices. Kokoro: Small but Mighty Kokoro stands out for its remarkable efficiency. With just 82 million parameters, it outperforms models several times its size, including XTTS (467M parameters) and MetaVoice (1.2B parameters). This proves that cutting-edge TTS is achievable without relying on massive models and powerful GPUs.\nRunning Kokoro with ONNX on Your CPU The key to running Kokoro efficiently on your CPU is ONNX (Open Neural Network Exchange), an open format for representing machine learning models. ONNX allows you to run the model on various platforms and hardware, including CPUs, without sacrificing performance. Here’s how you can set up and run Kokoro on your CPU using ONNX:\nSteps Install Dependencies Ensure you have Python and essential libraries like gradio, kokoro-onnx, soundfile, and tempfile installed.\nObtain the Kokoro ONNX Model Download the kokoro-v0_19.onnx model file.\nDownload the Voices File Obtain the voices.json file, which contains information about the available voices.\nCreate a Python Script The provided Python code demonstrates how to set up a simple Gradio interface to interact with the Kokoro model. You can modify this code to suit your needs.\nimport gradio as gr from kokoro_onnx import Kokoro import soundfile as sf import tempfile import os class TextToSpeechApp: def __init__(self): # Initialize Kokoro self.kokoro = Kokoro(\u0026#34;kokoro-v0_19.onnx\u0026#34;, \u0026#34;voices.json\u0026#34;) # Available voices self.voices = [ \u0026#39;af\u0026#39;, \u0026#39;af_bella\u0026#39;, \u0026#39;af_nicole\u0026#39;, \u0026#39;af_sarah\u0026#39;, \u0026#39;af_sky\u0026#39;, \u0026#39;am_adam\u0026#39;, \u0026#39;am_michael\u0026#39;, \u0026#39;bf_emma\u0026#39;, \u0026#39;bf_isabella\u0026#39;, \u0026#39;bm_george\u0026#39;, \u0026#39;bm_lewis\u0026#39; ] def generate_speech(self, text, voice, speed): try: # Generate audio samples, sample_rate = self.kokoro.create( text, voice=voice, speed=float(speed) ) # Create temporary file temp_dir = tempfile.mkdtemp() temp_path = os.path.join(temp_dir, \u0026#34;output.wav\u0026#34;) # Save to temporary file sf.write(temp_path, samples, sample_rate) return temp_path except Exception as e: return f\u0026#34;Error: {str(e)}\u0026#34; def create_interface(self): interface = gr.Interface( fn=self.generate_speech, inputs=[ gr.Textbox(label=\u0026#34;Enter text to convert\u0026#34;, lines=5), gr.Dropdown(choices=self.voices, label=\u0026#34;Select Voice\u0026#34;, value=self.voices[0]), gr.Slider(minimum=0.5, maximum=2.0, value=1.0, step=0.1, label=\u0026#34;Speech Speed\u0026#34;) ], outputs=gr.Audio(label=\u0026#34;Generated Speech\u0026#34;), title=\u0026#34;Text to Speech Converter\u0026#34;, description=\u0026#34;Convert text to speech using different voices and speeds.\u0026#34; ) return interface def main(): app = TextToSpeechApp() interface = app.create_interface() # Launch with a public URL interface.launch(server_name=\u0026#34;0.0.0.0\u0026#34;, share=True) if __name__ == \u0026#34;__main__\u0026#34;: main() Run the Script: Execute your Python script, and the Gradio interface will allow you to input text, select a voice, adjust the speech speed, and generate speech output.\nYou can download the code from github https://github.com/nkalra0123/kokoro-tts\nHugging face repo : https://huggingface.co/hexgrad/Kokoro-82M\nBenefits of Running Kokoro Locally on Your CPU Accessibility: You don’t need a high-end GPU to experience high-quality TTS.\nOffline Use: Once set up, you can use Kokoro offline, making it ideal for scenarios with limited or no internet connectivity.\nPrivacy: Processing text locally ensures your data remains private.\nConclusion Kokoro is a testament to the fact that efficient and powerful TTS is possible even on modest hardware. By leveraging the ONNX format and running it on your CPU, you can enjoy impressive text-to-speech capabilities without the need for a dedicated GPU. This opens up new possibilities for integrating TTS into a wide range of applications, even on devices with limited processing power.\n","permalink":"https://learncodecamp.net/kokoro-tts/","summary":"\u003cfigure\u003e\u003caudio controls src=\"/wp-content/uploads/2025/01/output-3.wav\"\u003e\u003c/audio\u003e\u003cfigcaption class=\"wp-element-caption\"\u003eThis sound is generated with Kokoro tts\u003c/figcaption\u003e\u003c/figure\u003e The world of text-to-speech (TTS) has seen incredible advancements, but often these powerful models require hefty hardware like GPUs. But what if you could run a top-tier TTS model locally on your CPU? Enter **Kokoro**, a game-changing TTS model that delivers impressive results even on resource-constrained devices.\n\u003ch3 id=\"kokoro-small-but-mighty\"\u003e\u003cstrong\u003eKokoro: Small but Mighty\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eKokoro stands out for its remarkable efficiency. With just \u003cstrong\u003e82 million parameters\u003c/strong\u003e, it outperforms models several times its size, including XTTS (467M parameters) and MetaVoice (1.2B parameters). This proves that cutting-edge TTS is achievable without relying on massive models and powerful GPUs.\u003c/p\u003e","title":"Kokoro: High-Quality Text-to-Speech(tts) on Your CPU with ONNX"},{"content":"Introduction Understanding BM-25: A Powerful Algorithm for Information Retrieval\nBm25 is an enhancement of the TF-IDF model that incorporates term frequency saturation and document length normalization to improve retrieval performance.\nWhen it comes to search engines and information retrieval, a vital piece of the puzzle is ranking the relevance of documents to a given query. One of the most widely used algorithms to achieve this is the BM25, Best Matching 25. BM25 is a probabilistic retrieval function that evaluates the relevance of a document to a search query, balancing simplicity and effectiveness, making it a popular choice in modern search engines and applications.\nBM25 is essentially a scoring function that calculates a numerical score to estimate the relevance of a document for a given query. This score is based on the occurrences and importance of query terms within the document. The higher the score, the more relevant the document is considered to be.\n$$ \\text{BM25} = \\sum_{i=1}^{n} \\text{IDF}(q_i) \\cdot \\frac{f(q_i, D) \\cdot (k_1 + 1)}{f(q_i, D) + k_1 \\cdot \\left( 1 - b + b \\cdot \\frac{\\text{avgDL}}{|D|} \\right)} $$\nWhere:\n( q_i ) is a term in the query.\n( f(q_i, D) ) is the frequency of term ( q_i ) in document ( D ).\n( |D| ) is the length of document ( D ) (number of words).\n( {avgDL} ) is the average document length in the corpus.\n( k_1 ) and ( b ) are free parameters, where:\n( k_1 ) controls the term frequency saturation, with typical values around 1.2 to 2.\n( b ) controls the degree of document length normalization, typically set to 0.75.\n( {IDF}(q_i) ) is the inverse document frequency of term ( q_i ).\nThe IDF function adjusts term weight based on its distribution across documents, giving more importance to rarer terms.\nKey Concepts in BM25 BM25 improves upon traditional retrieval models by considering several essential aspects:\nTerm Frequency (TF): The frequency of a term in the document directly impacts relevance. BM25, however, includes a saturation factor, meaning that additional occurrences of a term add less weight past a certain point, avoiding overemphasis on highly frequent terms. Inverse Document Frequency (IDF): BM25 uses IDF to balance term frequency by considering how rare or common a term is across documents in the corpus. This way, unique terms in a document are weighted more heavily than common terms. Document Length Normalization: BM25 incorporates a normalization factor to control the impact of document length on term frequency, which ensures that long documents are not unfairly penalized or favored. Adjustable Parameters (k1​ and b): BM25 allows flexibility with its two main parameters: k1​ adjusts term frequency scaling, with higher values meaning more emphasis on term frequency. b is a document length normalization parameter that allows BM25 to adapt to different types of datasets. Advantages of BM25 Term Saturation: Prevents excessively high scores for very frequent terms by introducing saturation. Length Normalization: Adjusts scores based on document length, reducing bias towards longer documents. Performance: Generally outperforms TF-IDF in retrieval tasks due to its more sophisticated modeling. Practical Applications of BM25 BM25 is widely applied in fields where accurate and relevant retrieval is essential, including:\nSearch Engines: BM25 is fundamental in search engine technology, powering both traditional search engines and modern engines like Elasticsearch and Solr, which are used in e-commerce, content management, and enterprise applications. Document Retrieval Systems: Many document-heavy systems, such as research databases, use BM25 to efficiently rank academic papers, reports, and other documents according to user queries. Social Media and News Retrieval: BM25 can help prioritize content that matches user interests by surfacing posts, articles, or news stories that best match user searches. Code for implementing BM-25 import math # BM25 parameters k1 = 1.5 b = 0.75 # Example document collection and query corpus = [ \u0026#34;the brown fox jumped over the brown dog\u0026#34;, \u0026#34;the lazy dog sat in the sun\u0026#34;, \u0026#34;the quick brown fox leaped over the lazy dog\u0026#34; ] query = [\u0026#34;brown\u0026#34;, \u0026#34;fox\u0026#34;] # Pre-compute average document length avg_doc_length = sum(len(doc.split()) for doc in corpus) / len(corpus) # Function to calculate term frequency in a document def term_frequency(term, document): return document.split().count(term) # Function to calculate document frequency for a term def document_frequency(term, corpus): return sum(1 for doc in corpus if term in doc) # BM25 function def bm25_score(query, document, corpus): score = 0 doc_length = len(document.split()) for term in query: tf = term_frequency(term, document) df = document_frequency(term, corpus) idf = math.log((len(corpus) - df + 0.5) / (df + 0.5) + 1) score += idf * ((tf * (k1 + 1)) / (tf + k1 * (1 - b + b * (doc_length / avg_doc_length)))) return score # Calculate BM25 scores for each document for doc in corpus: print(f\u0026#34;Document: {doc}\u0026#34;) print(f\u0026#34;BM25 Score: {bm25_score(query, doc, corpus)}\u0026#34;) Results for Query: [“brown”, “fox”]\nDocument: the brown fox jumped over the brown dog BM25 Score: 1.1414373853110722 Document: the lazy dog sat in the sun BM25 Score: 0.0 Document: the quick brown fox leaped over the lazy dog BM25 Score: 0.889947700346955 Instead of manually implementing BM25 calculations, we can use python package\nfrom rank_bm25 import BM25Okapi import pandas as pd # Sample corpus corpus = [ \u0026#34;the brown fox jumped over the brown dog\u0026#34;, \u0026#34;the lazy dog sat in the sun\u0026#34;, \u0026#34;the quick brown fox leaped over the lazy dog\u0026#34; ] # Tokenize the corpus tokenized_corpus = [doc.split() for doc in corpus] # Initialize BM25 bm25 = BM25Okapi(tokenized_corpus) # Query example query = \u0026#34;dog in sun\u0026#34; tokenized_query = query.split() # Get BM25 scores for the query scores = bm25.get_scores(tokenized_query) print(\u0026#34;BM25 Scores:\\n\u0026#34;, scores) But with this we get results like\nBM25 Scores: [-0.14521689 0. -0.11322166] If we use bm25 = BM25L(tokenized_corpus), we get scores\nBM25 Scores: [2.05626588 0. 1.14044998] We can read more about this from this paper: https://www.cs.otago.ac.nz/homepages/andrew/papers/2014-2.pdf\nIn the BM25 model, negative scores are unusual because BM25 is usually designed to produce non-negative relevance scores. However, there can be a few reasons why we are observing negative scores in this particular case:\nLibraries like rank_bm25 are optimized for larger corpora and may use precision adjustments that lead to small differences in final scores. For a small corpus, as in your example, these adjustments can result in noticeable differences in scores and may even lead to negative values if the library uses scaling that expects a larger corpus size.\nhttps://pypi.org/project/rank-bm25 This library has many implementations of BM25.\nThere are more advanced techniques also like BM42 – https://qdrant.tech/articles/bm42/\nConclusion BM25 remains one of the most efficient and effective ranking functions for information retrieval, especially in text-based search engines and large datasets. Its balance of term frequency, document length normalization, and adjustable parameters makes it flexible and adaptable across domains. By combining the fundamentals of probabilistic retrieval with an intuitive approach to term weighting, BM25 continues to be the go-to choice for modern search systems and beyond.\n","permalink":"https://learncodecamp.net/bm-25/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eUnderstanding BM-25: A Powerful Algorithm for Information Retrieval\u003c/p\u003e\n\u003cp\u003eBm25 is an enhancement of the \u003ca href=\"https://learncodecamp.net/tf-idf/\" data-type=\"link\" data-id=\"https://learncodecamp.net/tf-idf/\"\u003eTF-IDF\u003c/a\u003e model that incorporates term frequency saturation and document length normalization to improve retrieval performance.\u003c/p\u003e\n\u003cp\u003eWhen it comes to search engines and information retrieval, a vital piece of the puzzle is ranking the relevance of documents to a given query. One of the most widely used algorithms to achieve this is the BM25, \u003cem\u003eBest Matching 25\u003c/em\u003e. BM25 is a probabilistic retrieval function that evaluates the relevance of a document to a search query, balancing simplicity and effectiveness, making it a popular choice in modern search engines and applications.\u003c/p\u003e","title":"BM-25 Best Matching 25"},{"content":"Introduction TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It combines two metrics: Term Frequency (TF) and Inverse Document Frequency (IDF). The TF-IDF value increases proportionally with the number of times a word appears in the document and is offset by the frequency of the word in the corpus.\nComponents of TF-IDF Term Frequency (TF): Measures how frequently a term appears in a document. It’s calculated as:\n$TF(t,d) = \\frac{\\text{Number of times term } t \\text{ appears in document } d}{\\text{Total number of terms in document } d}$\nInverse Document Frequency (IDF): Measures how important a term is. While computing TF, all terms are considered equally important. IDF reduces the weight of terms that appear very frequently in the document set and increases the weight of terms that appear rarely. It’s calculated as:\n$$ IDF(t,D) = \\log \\left( \\frac{\\text{Total number of documents}}{\\text{Number of documents with term } t} \\right) $$\nTF-IDF: The product of TF and IDF for a term. It’s calculated as:\n$$ \\text{TF-IDF}(t,d,D) = \\text{TF}(t,d) \\times \\text{IDF}(t,D) $$\nExample Calculation Consider a small corpus of three documents:\nDocument 1 (D1): “the cat sat on the mat” Document 2 (D2): “the cat sat” Document 3 (D3): “the cat” Let’s calculate the TF-IDF for the term “cat” in each document.\nCalculate Term Frequency (TF) D1: TF(cat, D1) = 1/6 (since “cat” appears once and there are 6 words in D1) D2: TF(cat, D2) = 1/3 D3: TF(cat, D3) = 1/2 Calculate Inverse Document Frequency (IDF) : The term “cat” appears in all three documents, so: $$ \\text{IDF}(\\text{cat}, D) = \\log \\left( \\frac{3}{3} \\right) = \\log(1) = 0 $$\nSince the IDF is zero, it means the term “cat” is not useful in distinguishing between documents in this corpus.\nWhen using sklearn‘s TfidfVectorizer, the IDF calculation includes smoothing to prevent division by zero. The formula used by sklearn is:\n$$ \\text{IDF}(t, D) = \\log \\left( \\frac{1 + \\text{Number of documents with term } t}{1 + \\text{Total number of documents}} \\right) + 1 $$\nCode to calculate TF-IDF from sklearn.feature_extraction.text import TfidfVectorizer import pandas as pd # Sample corpus corpus = [ \u0026#34;the cat sat on the mat\u0026#34;, \u0026#34;the cat sat\u0026#34;, \u0026#34;the cat\u0026#34; ] # Initialize the vectorizer vectorizer = TfidfVectorizer() # Fit and transform the corpus X = vectorizer.fit_transform(corpus) # Convert the result to a dense matrix and print it df = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out()) print(df) sklearn TfidfVectorizer normalizes the vectors by default. This normalization affects the final TF-IDF scores, ensuring they are unit vectors.\nThe sklearn TfidfVectorizer uses L2 normalization by default. The L2 norm of a vector\n$$ v = \\left[ v_1, v_2, \\dots, v_n \\right] $$\nis defined as:\n$$ | v | = \\sqrt{v_1^2 + v_2^2 + \\dots + v_n^2} $$\nwith above code, we get the following output\ncat mat on sat the 0 0.284077 0.480984 0.480984 0.365801 0.568154 1 0.522842 0.000000 0.000000 0.673255 0.522842 2 0.707107 0.000000 0.000000 0.000000 0.707107 =0.284077^2+2(0.480984^2)+0.365801^2+0.568154^2 = 1.0000\nCode to visualize TF-IDF To visualize the TF-IDF score we can use this code\nimport pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.feature_extraction.text import TfidfVectorizer # Sample corpus corpus = [ \u0026#34;the cat sat on the mat\u0026#34;, \u0026#34;the cat sat\u0026#34;, \u0026#34;the cat\u0026#34; ] # Initialize the vectorizer vectorizer = TfidfVectorizer() # Fit and transform the corpus X = vectorizer.fit_transform(corpus) # Convert the result to a dense matrix and create a DataFrame df = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out()) # Add document identifiers df[\u0026#39;document\u0026#39;] = [\u0026#39;D1\u0026#39;, \u0026#39;D2\u0026#39;, \u0026#39;D3\u0026#39;] # Melt the DataFrame for easier plotting df_melted = df.melt(id_vars=\u0026#39;document\u0026#39;, var_name=\u0026#39;term\u0026#39;, value_name=\u0026#39;tfidf\u0026#39;) # Plot the TF-IDF scores plt.figure(figsize=(12, 8)) sns.barplot(data=df_melted, x=\u0026#39;term\u0026#39;, y=\u0026#39;tfidf\u0026#39;, hue=\u0026#39;document\u0026#39;) plt.title(\u0026#39;TF-IDF Scores for Terms in Each Document\u0026#39;) plt.xlabel(\u0026#39;Term\u0026#39;) plt.ylabel(\u0026#39;TF-IDF Score\u0026#39;) plt.legend(title=\u0026#39;Document\u0026#39;) plt.show() Visualization results TF-IDF\nUse Cases TF-IDF (Term Frequency-Inverse Document Frequency) is widely used in various text mining and natural language processing (NLP) applications due to its simplicity and effectiveness in identifying important terms within documents. Here are some common use cases:\n1. Information Retrieval TF-IDF is used to rank documents based on their relevance to a query. Documents with terms that have high TF-IDF scores for the query terms are considered more relevant.\nExample: Search engines use TF-IDF to index web pages and rank search results. 2. Text Classification TF-IDF is often used as a feature extraction technique for text classification tasks. It helps in transforming text into numerical vectors that can be fed into machine learning models.\nExample: Spam detection in emails, sentiment analysis, and topic categorization. 3. Document Similarity TF-IDF is used to measure the similarity between documents by comparing their TF-IDF vectors.\nExample: Finding duplicate documents, clustering similar documents, and recommending similar articles. 4. Keyword Extraction TF-IDF can be used to extract keywords or key phrases from a document, as terms with high TF-IDF scores are considered important.\nExample: Summarizing articles, generating tags for documents, and content analysis. 5. Content Recommendation TF-IDF vectors can be used to recommend content based on user preferences and document similarities.\nExample: Recommending news articles, research papers, or products based on textual descriptions. 6. Document Clustering TF-IDF is used to convert documents into numerical vectors for clustering algorithms like K-means, enabling the grouping of similar documents.\nExample: Grouping customer reviews, organizing a large corpus of text into coherent clusters. In the coming blogs, we will read about BM-25 (Best Matching 25), and then about ReRanker, which are important components to improve the RAG performance\nhttps://www.anthropic.com/news/contextual-retrieval\n","permalink":"https://learncodecamp.net/tf-idf/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eTF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It combines two metrics: Term Frequency (TF) and Inverse Document Frequency (IDF). The TF-IDF value increases proportionally with the number of times a word appears in the document and is offset by the frequency of the word in the corpus.\u003c/p\u003e\n\u003ch3 id=\"components-of-tf-idf\"\u003eComponents of TF-IDF\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003eTerm Frequency (TF)\u003c/strong\u003e: Measures how frequently a term appears in a document. It’s calculated as:\u003c/p\u003e","title":"TF-IDF"},{"content":"Introduction The latest Ollama update makes it easier than ever to run quantized GGUF models directly from Hugging Face on your local machine. With a single command, you can bypass previous limitations, no longer needing a separate model on the Ollama Model Hub.\nStep-by-Step Guide 1. Install Ollama\nDownload and install Ollama on your computer. Once installed, the ollama command will be accessible from your command line interface (CLI). 2. Select a Model from Hugging Face\nGo to the Hugging Face Model Hub and choose a model. For best performance, especially on local setups, consider selecting a smaller model. 3. Copy the Model Link\nFind the model’s URL, which includes the username and model name. 4. Run the Model with Ollama\nOpen your CLI and use this command to run the model directly ollama run hf.co/username\u0026gt;/model_name\u0026gt;:latest #Example ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF \u0026gt;\u0026gt;\u0026gt; hi How are you today? Is there something I can help you with or would you like to chat? Replace \u0026lt;username\u0026gt; and \u0026lt;model_name\u0026gt; with the respective values from the link.\n5. Specify a Different Version (Optional)\nIf you prefer a different quantized version, like Q8, add the version after the model name in your command. 6. Run and Download\nOnce you run the command, Ollama will download the specified model from Hugging Face, making it available for local use. Why This Update Matters The ability to seamlessly run Hugging Face models locally with Ollama expands the flexibility for model experimentation and deployment. No additional steps or setups on the Ollama Model Hub are required, which saves time and simplifies the process for developers.\nConnecting to the Ollama API from Other Devices To connect other devices to your Ollama instance and use the API, you’ll need to configure two environment variables that control the network interface Ollama listens on and the allowed origins for incoming requests. These steps apply to both Windows and Linux.\n1. Setting Environment Variables On Linux: Use the export command to set environment variables: OLLAMA_HOST: Set this variable to \u0026quot;0.0.0.0\u0026quot; to make Ollama listen on all network interfaces, allowing local network connections. For restricted access, specify a particular IP address instead. OLLAMA_ORIGINS: Set to \u0026quot;*\u0026quot; for unrestricted access from any origin, ideal for development. In production, restrict this to specific origins to enhance security. 2. Restart Ollama Restart the Ollama service to apply these changes. This can usually be done through the Ollama interface or with the relevant command for your OS. After restarting, other devices on your network can connect to the API using the IP address or hostname of the machine running Ollama, along with the port number (default is 11434).\nExample of a Chat Completion Endpoint accessing GGUF model With Ollama configured to accept connections from other devices, you can interact with your models using the chat completion endpoint.\nEndpoint: POST /api/chat Base URL: The default URL is http://localhost:11434 Example request using curl:\ncurl http://localhost:11434/api/chat \\ -d \u0026#39;{ \u0026#34;model\u0026#34;: \u0026#34;hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF\u0026#34;, \u0026#34;messages\u0026#34;: [ { \u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: \u0026#34;What is the weather like today?\u0026#34; } ],\u0026#34;stream\u0026#34;: false }\u0026#39; {\u0026#34;model\u0026#34;:\u0026#34;hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF\u0026#34;,\u0026#34;created_at\u0026#34;:\u0026#34;2024-11-01T15:48:46.168058Z\u0026#34;,\u0026#34;message\u0026#34;:{\u0026#34;role\u0026#34;:\u0026#34;assistant\u0026#34;,\u0026#34;content\u0026#34;:\u0026#34;However, I\u0026#39;m a large language model, I don\u0026#39;t have real-time access to current weather conditions. But I can suggest some options for you to find out the current weather:\\n\\n1. **Check online weather websites**: You can check websites like AccuWeather, Weather.com, or the National Weather Service (NWS) for current and forecasted weather conditions.\\n2. **Use a search engine**: Type \\\u0026#34;weather today\\\u0026#34; or \\\u0026#34;current weather in [your city/state]\\\u0026#34; to find the latest information on the web.\\n3. **Check your smartphone\u0026#39;s weather app**: Many smartphones come with built-in weather apps that provide real-time weather updates.\\n\\nIf you\u0026#39;d like, I can give you some general information about typical weather conditions in different parts of the world. Just let me know!\u0026#34;},\u0026#34;done_reason\u0026#34;:\u0026#34;stop\u0026#34;,\u0026#34;done\u0026#34;:true,\u0026#34;total_duration\u0026#34;:5455977456,\u0026#34;load_duration\u0026#34;:23511786,\u0026#34;prompt_eval_count\u0026#34;:17,\u0026#34;prompt_eval_duration\u0026#34;:37940000,\u0026#34;eval_count\u0026#34;:159,\u0026#34;eval_duration\u0026#34;:5393094000}⏎ In this example, a POST request is sent to the /api/chat endpoint, with JSON data specifying the model (\u0026quot;hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF\u0026quot;) and a user message. The response will contain the model’s generated output.\n","permalink":"https://learncodecamp.net/gguf-model-with-ollama/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eThe latest Ollama update makes it easier than ever to run quantized GGUF models directly from Hugging Face on your local machine. With a single command, you can bypass previous limitations, no longer needing a separate model on the Ollama Model Hub.\u003c/p\u003e\n\u003chr /\u003e\n\u003ch3 id=\"step-by-step-guide\"\u003e\u003cstrong\u003eStep-by-Step Guide\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003e1. Install Ollama\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload and install Ollama on your computer. Once installed, the \u003ccode\u003eollama\u003c/code\u003e command will be accessible from your command line interface (CLI).\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003e2. Select a Model from Hugging Face\u003c/strong\u003e\u003c/p\u003e","title":"Running Any GGUF Model from Hugging Face with Ollama"},{"content":"Introduction OpenAI has launched a groundbreaking new feature for ChatGPT: SearchGPT. This innovative tool blends the conversational nature of a chatbot with the vast resources of the internet, potentially changing the way we search for information forever.\nWith SearchGPT, users can ask questions in natural language and receive concise answers, complete with links to relevant web sources. No more wading through pages of search results or deciphering complex search syntax – SearchGPT aims to streamline the process, making it easier and faster to find what you need.\nWhat makes SearchGPT different? Conversational search: Ask questions in a natural, conversational style, and SearchGPT will interpret your intent and provide relevant results. The ability to ask follow-up questions within the same chat session allows for a more nuanced and in-depth exploration of a topic.\nUp-to-date information: SearchGPT utilises real-time data from various sources, including news providers, financial markets, and weather services, ensuring the information you receive is current.\nTransparency and Verification: SearchGPT provides links to the sources used to generate its responses, enabling users to verify the accuracy of the information and explore the original context. This emphasis on transparency fosters trust and accountability.\n**\nAd-Free Experience:** At present, SearchGPT does not display ads, offering a cleaner and more focused search experience compared to ad-laden results pages of traditional search engines.\nSource Citations: Unlike traditional search engines, SearchGPT provides clear source citations for its answers, enabling users to verify the information and explore further. These sources are generally of high quality, with good domain authority.\nWith this OpenAI is giving competition to perplexity and google.\nLet’s try I tried SearchGPT for a few queries, it responds with accurate information most of the time, with good sources, but sometime it gives the wrong information.\nweather update from SearchGPT\nSometimes it gets wrong results also, Only the first pizza place is from Gurugram, the rest are from Mumbai and other places\nLocalized search results from SearchGPT\nCould SearchGPT be the “AltaVista moment” for Google? Many believe SearchGPT has the potential to disrupt the search landscape significantly. Its focus on a seamless, conversational search experience, combined with its commitment to providing accurate and up-to-date information, presents a compelling alternative to traditional search engines. Only time will tell how the market will react, but SearchGPT is undoubtedly a major step forward in the evolution of search.\nIf you have plus subscription plan, do try it, you can also install the Chrome extension, to set the default search engine to ChatGPT search\nhttps://chromewebstore.google.com/detail/chatgpt-search/ejcfepkfckglbgocfkanmcdngdijcgld\nYou can read more about this from the official blog\n","permalink":"https://learncodecamp.net/searchgpt/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003e\u003cstrong\u003eOpenAI has launched a groundbreaking new feature for ChatGPT: SearchGPT.\u003c/strong\u003e This innovative tool blends the conversational nature of a chatbot with the vast resources of the internet, potentially changing the way we search for information forever.\u003c/p\u003e\n\u003cp\u003eWith SearchGPT, users can ask questions in natural language and receive concise answers, complete with links to relevant web sources. No more wading through pages of search results or deciphering complex search syntax – SearchGPT aims to streamline the process, making it easier and faster to find what you need.\u003c/p\u003e","title":"SearchGPT: The Future of Search?"},{"content":" NotebookLM: An AI-Powered Research Assistant NotebookLM is a research assistant powered by Google’s Gemini 1.5 Pro model. It’s centred around the idea of using sources and then leveraging the power of Gemini to interact with and learn from them. Here are some of the key features that make NotebookLM such a powerful tool:\n1. Versatile Source Integration NotebookLM supports a variety of source formats, including:\nAudio files Markdown documents PDFs Google Docs and Slides Websites YouTube videos Text notes Users can upload up to 50 sources per notebook, offering great flexibility in consolidating and analyzing diverse information.\n2. NotebookLM Interactive Chat and Note-Taking NotebookLM allows users to engage in a chat-style interaction with the Gemini model based on the uploaded sources. The chat feature supports follow-up questions, enabling users to further explore specific aspects of the source material. Users can create written notes to summarise their research or capture important insights. NotebookLM also automatically generates notes from model responses, including citations for easy verification. Both user-created and model-generated notes can be further analyzed and used for generating summaries and other content. 3. Automated Content Generation Audio Overviews: NotebookLM can generate engaging audio overviews based on the uploaded sources, featuring a conversational style with two AI hosts. This feature facilitates quick and accessible consumption of information. Pre-formatted Guides: NotebookLM offers various pre-formatted guides that can be generated based on the sources, such as: FAQs Study guides Tables of contents Timelines Briefing docs Summaries: NotebookLM can provide concise summaries of the key points discussed in the uploaded sources. Suggested Questions: The tool suggests relevant questions based on the source material, helping users to focus their research and initiate further exploration. 4. Format Conversion NotebookLM demonstrates proficiency in converting content from one format to another. For example, it can transform a website article into a markdown format suitable for a GitHub repository. This functionality saves time and effort in adapting content for different platforms.\n5. Citation Support A significant advantage of NotebookLM is its consistent use of citations for statements generated by the model. This feature ensures the accuracy and trustworthiness of the information provided, allowing users to easily verify the claims by referring back to the original source material.\n6. Collaboration NotebookLM allows users to share their notebooks with others, granting either viewer or editor access. This functionality facilitates collaboration and knowledge sharing among teams and colleagues.\n7. Continuous Improvement As an experimental product, NotebookLM is constantly being developed and enhanced with new features and functionalities. The creators are actively seeking user feedback to identify limitations and areas for improvement.\nOverall, NotebookLM offers an impressive suite of features that empower users to efficiently research, analyse, and synthesise information from various sources. Its capabilities as an AI-powered research assistant make it a valuable tool for students, professionals, and researchers across different disciplines.\nIf you are still fixated on audio podcast feature of NotebookLM, you can listen to this To try NotebookLM head over to : https://notebooklm.google.com/\n","permalink":"https://learncodecamp.net/notebooklm/","summary":"\u003cfigure\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"661\" src=\"/wp-content/uploads/2024/10/image-1024x661.png\" alt=\"\" /\u003e \u003c/figure\u003e\u003c/p\u003e\n\u003ch1 id=\"notebooklm-an-ai-powered-research-assistant\"\u003eNotebookLM: An AI-Powered Research Assistant\u003c/h1\u003e\n\u003cp\u003eNotebookLM is a research assistant powered by Google’s Gemini 1.5 Pro model. It’s centred around the idea of using sources and then leveraging the power of Gemini to interact with and learn from them. Here are some of the key features that make NotebookLM such a powerful tool:\u003c/p\u003e\n\u003ch2 id=\"1-versatile-source-integration\"\u003e1. Versatile Source Integration\u003c/h2\u003e\n\u003cp\u003eNotebookLM supports a variety of source formats, including:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eAudio files\u003c/li\u003e\n\u003cli\u003eMarkdown documents\u003c/li\u003e\n\u003cli\u003ePDFs\u003c/li\u003e\n\u003cli\u003eGoogle Docs and Slides\u003c/li\u003e\n\u003cli\u003eWebsites\u003c/li\u003e\n\u003cli\u003eYouTube videos\u003c/li\u003e\n\u003cli\u003eText notes\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eUsers can upload up to 50 sources per notebook, offering great flexibility in consolidating and analyzing diverse information.\u003c/p\u003e","title":"Unleashing the Full Potential of NotebookLM: Beyond Audio Generation to Comprehensive Research Assistance"},{"content":"Tokenization is a fundamental yet often misunderstood process in the realm of large language models (LLMs). Despite its crucial role, it is a part of working with LLMs that many find daunting due to its complexity and the numerous challenges it introduces. In this blog post, we will explore the concept of tokenization, its importance in language models like GPT-2, and the various issues associated with it.\nIntroduction to Tokenization Tokenization is the process of converting raw text into smaller units called tokens. These tokens can be as small as individual characters or as large as entire words or subwords, depending on the specific tokenizer being used. Tokenization is the first step in feeding text data into a neural network, making it a critical component in the performance of LLMs.\nThe GPT-2 Paper and Tokenization The GPT-2 paper introduced byte-level encoding as a tokenization mechanism. This approach allowed the model to handle a wide variety of characters, including those in non-English languages, while maintaining a manageable vocabulary size. The paper discussed the creation of a tokenizer with a vocabulary of 50,257 tokens and a context size of 1,024 tokens. These tokens are the fundamental units that the model processes, and understanding how they are created and used is key to understanding the behavior of the model.\nTokenization in Practice In practice, tokenization is not just a matter of breaking down text into words or characters. Modern tokenizers often use more sophisticated methods, such as Byte Pair Encoding (BPE), to create subword units that are more meaningful and efficient for the model to process. BPE tokenizers break down words into smaller parts that frequently appear together in the text, allowing for a more compact representation of the data.\nFor example, in the GPT-2 tokenizer, the word “tokenization” might be split into several subword tokens like “token,” “iza,” and “tion.” This approach allows the model to better understand and generate text, especially when dealing with rare or novel words.\nThe Complexities of Tokenization While tokenization might seem straightforward at first glance, it introduces several complexities that can significantly impact the performance of LLMs.\nIssues Stemming from Tokenization Inconsistent Tokenization Across Languages: One of the most prominent issues with tokenization is its inconsistency across different languages. For instance, English text might be tokenized into fewer tokens compared to text in languages like Korean or Japanese. This discrepancy arises because the tokenizer was likely trained on a dataset with more English text, resulting in larger, more efficient tokens for English. Non-English text often ends up with more tokens, which bloats the sequence length and limits the context window of the Transformer model. Handling Special Characters and Punctuation: Special characters, punctuation, and spaces can also lead to inefficient tokenization. For example, a sequence of spaces in Python code might be tokenized into multiple separate tokens, leading to wasted space and reduced efficiency in processing the code. This inefficiency can be particularly problematic in models like GPT-2, which were not specifically optimized for handling programming languages. Case Sensitivity and Arbitrary Token Splits: Another challenge is the arbitrary splitting of tokens based on case sensitivity or position within a sentence. For example, the word “egg” might be tokenized differently depending on whether it is at the beginning of a sentence, capitalized, or preceded by a space. These inconsistencies force the model to learn that different tokens might represent the same concept, adding unnecessary complexity to the training process. Impact on Performance: These tokenization issues can lead to poor performance on specific tasks. For instance, large language models often struggle with simple arithmetic or spelling tasks because the tokenization process does not align well with the nature of these tasks. Similarly, LLMs might perform poorly on non-English languages or when processing code due to inefficient tokenization. Tokenization by Example: Using the TikTokenizer Web App To better understand tokenization in action, let’s take a look at the TikTokenizer web app. This tool allows you to input text and see how it is tokenized by different tokenizers, such as the GPT-2 tokenizer or the GPT-4 tokenizer.\nFor example, typing “Hello, world!” into the app using the GPT-2 tokenizer might result in several tokens, each representing different parts of the string. The word “tokenization” might be split into two tokens, and spaces and punctuation are treated as separate tokens. By switching to the GPT-4 tokenizer, the same string might be tokenized into fewer tokens, demonstrating the improvements in efficiency made in the newer model.\nThe app also highlights how tokenization handles numbers, special characters, and non-English text. For instance, a four-digit number might be split into multiple tokens in an arbitrary manner, while non-English text might be tokenized into many small tokens, reflecting the inefficiencies discussed earlier.\nThe Evolution of Tokenizers: From GPT-2 to GPT-4 The transition from the GPT-2 tokenizer to the GPT-4 tokenizer showcases the evolution in tokenization techniques. One major improvement in GPT-4 is the increased vocabulary size, which allows for denser representations of text. This means that the same string of text can be represented with fewer tokens, leading to more efficient processing and the ability to attend to more context within the Transformer model.\nAnother significant improvement in GPT-4 is the handling of white space and indentation in code. In GPT-2, each space in a Python code snippet might be tokenized separately, leading to inefficiencies. GPT-4, on the other hand, groups spaces into single tokens, making the tokenization of code more compact and efficient.\nTokenization and Unicode: Challenges and Solutions Tokenization also intersects with the challenges of encoding text in different languages and scripts. Python strings are sequences of Unicode code points, which represent characters as integers. However, directly using Unicode code points as tokens would result in an excessively large and unstable vocabulary, as the Unicode standard is constantly evolving.\nInstead, tokenizers often use byte encodings, such as UTF-8, to represent text. UTF-8 is a variable-length encoding that can represent each Unicode code point with one to four bytes. This encoding is widely used because it is backward-compatible with ASCII and efficiently handles a wide range of characters.\nHowever, simply using UTF-8 encoded bytes as tokens would create a vocabulary of only 256 tokens, which is too small and would lead to long sequences of tokens for even simple text. Therefore, more sophisticated tokenization techniques, such as Byte Pair Encoding, are used to balance the need for a manageable vocabulary size with the need for efficient representation of text.\nConclusion Tokenization is a crucial yet complex aspect of working with large language models. While it might seem like a simple preprocessing step, it has far-reaching implications for the performance and behavior of models like GPT-2 and GPT-4. Issues related to tokenization can manifest in various ways, from inefficiencies in processing code to poor performance on non-English languages.\nUnderstanding tokenization and its impact on language models is essential for anyone working with LLMs. As tokenization techniques continue to evolve, we can expect future models to become even more efficient and capable of handling a wider range of languages and tasks. However, the challenges of tokenization will likely persist, requiring ongoing research and innovation in this critical area of natural language processing.\nThis article is created from learnings of video: https://www.youtube.com/watch?v=zduSFxRajkE\nFew Books Recommendations: https://amzn.to/4dukwT8\n","permalink":"https://learncodecamp.net/tokenization-llm-p1/","summary":"\u003cp\u003eTokenization is a fundamental yet often misunderstood process in the realm of large language models (LLMs). Despite its crucial role, it is a part of working with LLMs that many find daunting due to its complexity and the numerous challenges it introduces. In this blog post, we will explore the concept of tokenization, its importance in language models like GPT-2, and the various issues associated with it.\u003c/p\u003e\n\u003ch2 id=\"introduction-to-tokenization\"\u003eIntroduction to Tokenization\u003c/h2\u003e\n\u003cp\u003eTokenization is the process of converting raw text into smaller units called tokens. These tokens can be as small as individual characters or as large as entire words or subwords, depending on the specific tokenizer being used. Tokenization is the first step in feeding text data into a neural network, making it a critical component in the performance of LLMs.\u003c/p\u003e","title":"Understanding Tokenization in Large Language Models: A Deep Dive – Part 1"},{"content":"For part 1 refer to this: Unveiling the Secrets Behind ChatGPT – Part 1 (learncodecamp.net)\nImplementing a Bigram Language Model When diving into the world of natural language processing (NLP) and language modeling, starting with a simple baseline model is essential. It helps establish a foundation to build upon. One of the simplest and most intuitive models for language generation is the bigram language model. This blog post will walk you through the implementation of a bigram language model using PyTorch, explaining the key concepts, steps, and code snippets along the way.\nIntroduction to the Bigram Language Model A bigram language model predicts the next word in a sequence based solely on the previous word. It’s a straightforward approach that captures some of the local dependencies in the text. While it’s not as powerful as more complex models, it’s a great starting point to understand the basics of language modeling.\nAs we are working with characters in this example, our bigram model will look at just the previous character.\nImplementing the Bigram Language Model in PyTorch This model will include an embedding layer that maps input tokens to vectors and a forward method to compute the logits for the next token prediction.\nIn the constructor, we create a token embedding table of size vocab_size x vocab_size using nn.Embedding. The forward method processes input indices to produce logits, which are the scores for the next character in the sequence. If targets are provided, it also computes the cross-entropy loss.\nvocab_size = 65, xb and yb are input and output tensors.\nimport torch import torch.nn as nn from torch.nn import functional as F torch.manual_seed(1337) class BigramLanguageModel(nn.Module): def __init__(self, vocab_size): super().__init__() # each token directly reads off the logits for the next token from a lookup table self.token_embedding_table = nn.Embedding(vocab_size, vocab_size) def forward(self, idx, targets=None): # idx and targets are both (B,T) tensor of integers logits = self.token_embedding_table(idx) # (B,T,C) if targets is None: loss = None else: B, T, C = logits.shape logits = logits.view(B*T, C) targets = targets.view(B*T) loss = F.cross_entropy(logits, targets) return logits, loss def generate(self, idx, max_new_tokens): # idx is (B, T) array of indices in the current context for _ in range(max_new_tokens): # get the predictions logits, loss = self(idx) # focus only on the last time step logits = logits[:, -1, :] # becomes (B, C) # apply softmax to get probabilities probs = F.softmax(logits, dim=-1) # (B, C) # sample from the distribution idx_next = torch.multinomial(probs, num_samples=1) # (B, 1) # append sampled index to the running sequence idx = torch.cat((idx, idx_next), dim=1) # (B, T+1) return idx m = BigramLanguageModel(vocab_size) logits, loss = m(xb, yb) print(logits.shape) print(loss) print(decode(m.generate(idx = torch.zeros((1, 1), dtype=torch.long), max_new_tokens=100)[0].tolist())) Evaluating the Loss The loss function used here is the negative log likelihood loss, implemented in PyTorch as cross-entropy loss. This function measures the quality of the logits concerning the targets, essentially evaluating how well the model predicts the next character.\nGenerating Text from the Model Once we have our model trained, we want to generate text. The generate function extends the input sequence by predicting the next token iteratively.\nTraining the Model To make the model useful, we need to train it on a corpus of text. Here, we use the Adam optimizer for training.\n# create a PyTorch optimizer optimizer = torch.optim.AdamW(m.parameters(), lr=1e-3) batch_size = 32 for steps in range(100): # increase number of steps for good results... # sample a batch of data xb, yb = get_batch(\u0026#39;train\u0026#39;) # evaluate the loss logits, loss = m(xb, yb) optimizer.zero_grad(set_to_none=True) loss.backward() optimizer.step() print(loss.item()) Evaluating the Model After training, we evaluate the model’s performance. Although a bigram model is quite limited, the loss should decrease as training progresses, indicating the model’s improving ability to predict the next token.\nGenerating Improved Text With the trained model, we can now generate text that should be more coherent than the initial random outputs.\nprint(decode(m.generate(idx = torch.zeros((1, 1), dtype=torch.long), max_new_tokens=500)[0].tolist())) Let’s learn more about B T C\nB, T, and C refer to the dimensions of the tensors that represent batches of sequences of data. Let’s break down what each of these dimensions stands for:\nB (Batch Size): This dimension represents the number of sequences or samples processed together in one forward/backward pass of the neural network. Using batches allows for more efficient computation and training on modern hardware. For instance, if B=32, it means that 32 sequences are being processed simultaneously. T (Time Steps or Sequence Length): This dimension corresponds to the length of each sequence in the batch. In the context of language models, this usually represents the number of tokens (e.g., characters, words) in each sequence. If T=8, it means each sequence contains 8 tokens. C (Channels or Vocabulary Size): In language models, this dimension often represents the size of the vocabulary, i.e., the number of unique tokens (characters, words, etc.) that the model can recognize. For instance, if C=65, it means the vocabulary contains 65 unique tokens. Tensor Shapes in the Bigram Language Model Embedding Table The embedding table is a matrix of size vocab_size x vocab_size. Here, the vocabulary size is C. Each row of this matrix is a vector representation of a token from the vocabulary.\nInputs and Outputs When we input a batch of sequences into the model, it has the shape [B, T], where B is the batch size and T is the sequence length. Each element in this tensor is an integer index corresponding to a token in the vocabulary.\nLogits After processing the input through the embedding layer, we get a tensor of shape [B, T, C]:\nB (Batch Size): Number of sequences being processed simultaneously. T (Time Steps): Number of tokens in each sequence. C (Channels or Vocabulary Size): For each token in the sequence, the model outputs a score (logit) for each possible token in the vocabulary. Reshaping for Loss Calculation When calculating the loss, the shape of the logits and targets needs to be compatible with the requirements of the cross-entropy loss function in PyTorch.\nConclusion The bigram language model serves as a fundamental stepping stone in language modeling. While it’s a simple approach, it forms the basis for more advanced models that consider longer contexts and dependencies. By understanding and implementing the bigram model, we lay the groundwork for exploring more sophisticated architectures like the Transformer, which can handle much more complex language tasks.\nFor part 1 refer to this: Unveiling the Secrets Behind ChatGPT – Part 1 (learncodecamp.net)\nWe will continue learning this, in the next part of the blog.\nFor complete code, you can check this notebook: gpt-dev.ipynb – Colab (google.com)\nFor a complete video you can check this: https://www.youtube.com/watch?v=kCc8FmEb1nY\n","permalink":"https://learncodecamp.net/build-gpt-from-scratch-p2/","summary":"\u003cp\u003eFor part 1 refer to this: \u003ca href=\"https://learncodecamp.net/build-gpt-from-scratch-p1/\"\u003eUnveiling the Secrets Behind ChatGPT – Part 1 (learncodecamp.net)\u003c/a\u003e\u003c/p\u003e\n\u003ch2 id=\"implementing-a-bigram-language-model\"\u003eImplementing a Bigram Language Model\u003c/h2\u003e\n\u003cp\u003eWhen diving into the world of natural language processing (NLP) and language modeling, starting with a simple baseline model is essential. It helps establish a foundation to build upon. One of the simplest and most intuitive models for language generation is the bigram language model. This blog post will walk you through the implementation of a bigram language model using PyTorch, explaining the key concepts, steps, and code snippets along the way.\u003c/p\u003e","title":"Unveiling the Secrets Behind ChatGPT – Part 2"},{"content":"Introduction Hello everyone! By now, you’ve likely heard of ChatGPT, the revolutionary AI system that has taken the world and the AI community by storm. This remarkable technology allows you to interact with an AI through text-based tasks.\nThe Technology Behind ChatGPT: Transformers The neural network that powers ChatGPT is based on the Transformer architecture, introduced in the 2017 paper “Attention is All You Need.” GPT stands for “Generatively Pre-trained Transformer.” The Transformer architecture is a landmark development in AI that revolutionized the field, primarily in natural language processing (NLP). The Transformer architecture, initially designed for machine translation, became the backbone for numerous AI applications, including ChatGPT.\nBuilding a Transformer-Based Language Model While replicating ChatGPT’s capabilities is a daunting task, we can gain valuable insights by building a smaller Transformer-based language model. We’ll focus on a character-level language model using the “tiny Shakespeare” dataset, which contains the complete works of Shakespeare in a single file. This dataset is approximately one megabyte in size, making it manageable for educational purposes.\nInput : raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\nTokenization and Training First, we need to tokenize the input text. Tokenization converts raw text into a sequence of integers based on a predefined vocabulary. In our case, we use a character-level tokenizer, meaning each character is assigned an integer. Here’s how you can implement it in Python:\n# read it in to inspect it with open(\u0026#39;input.txt\u0026#39;, \u0026#39;r\u0026#39;, encoding=\u0026#39;utf-8\u0026#39;) as f: text = f.read() # here are all the unique characters that occur in this text chars = sorted(list(set(text))) vocab_size = len(chars) print(\u0026#39;\u0026#39;.join(chars)) print(vocab_size) #Output: # !$\u0026amp;\u0026#39;,-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz #65 There are just 65 characters in our vocabulary, characters from A-Z, a-z, space and some special symbols.\nLet’s write a simple tokenizer function for the text\n# create a mapping from characters to integers stoi = { ch:i for i,ch in enumerate(chars) } itos = { i:ch for i,ch in enumerate(chars) } encode = lambda s: [stoi[c] for c in s] # encoder: take a string, output a list of integers decode = lambda l: \u0026#39;\u0026#39;.join([itos[i] for i in l]) # decoder: take a list of integers, output a string print(encode(\u0026#34;hii there\u0026#34;)) print(decode(encode(\u0026#34;hii there\u0026#34;))) # Output: # [46, 47, 47, 1, 58, 46, 43, 56, 43] # hii there Let’s now encode the entire text dataset and store it into a torch.Tensor\nimport torch # we use PyTorch: https://pytorch.org data = torch.tensor(encode(text), dtype=torch.long) print(data.shape, data.dtype) # Output: # torch.Size([1115394]) torch.int64 Next, we split the dataset into training and validation sets to monitor overfitting. The training set comprises the first 90% of the data, while the remaining 10% serves as the validation set.\nn = int(0.9*len(data)) # first 90% will be train, rest val train_data = data[:n] val_data = data[n:] Now, let’s define the block size\nWhat is Block Size? Block size refers to the maximum length of the sequence of text that the model processes at one time. In the context of training a Transformer-based language model, it defines the length of the text chunks (or blocks) that the model will use to make predictions. For example, if the block size is set to 8, the model will look at sequences of 8 characters at a time to predict the next character.\nblock_size = 8 train_data[:block_size+1] Why the “+1”? The “+1” comes into play because, during training, we want to create examples where the model predicts the next character in a sequence. To do this, we need two sets of data:\nInputs (X): The sequence of characters up to the current position. Targets (Y): The next character that follows each position in the sequence. x = train_data[:block_size] y = train_data[1:block_size+1] for t in range(block_size): context = x[:t+1] target = y[t] print(f\u0026#34;when input is {context} the target: {target}\u0026#34;) Output of above code\nwhen input is tensor([18]) the target: 47 when input is tensor([18, 47]) the target: 56 when input is tensor([18, 47, 56]) the target: 57 when input is tensor([18, 47, 56, 57]) the target: 58 when input is tensor([18, 47, 56, 57, 58]) the target: 1 when input is tensor([18, 47, 56, 57, 58, 1]) the target: 15 when input is tensor([18, 47, 56, 57, 58, 1, 15]) the target: 47 when input is tensor([18, 47, 56, 57, 58, 1, 15, 47]) the target: 58 What is Batch Size? Batch size is the number of examples that are processed together in one iteration during the training of a model. Instead of updating the model’s weights after each individual sequence, we update them after processing a batch of sequences, which makes the training process more efficient and stable.\ntorch.manual_seed(1337) batch_size = 4 # how many independent sequences will we process in parallel? block_size = 8 # what is the maximum context length for predictions? def get_batch(split): # generate a small batch of data of inputs x and targets y data = train_data if split == \u0026#39;train\u0026#39; else val_data ix = torch.randint(len(data) - block_size, (batch_size,)) x = torch.stack([data[i:i+block_size] for i in ix]) y = torch.stack([data[i+1:i+block_size+1] for i in ix]) return x, y xb, yb = get_batch(\u0026#39;train\u0026#39;) print(\u0026#39;inputs:\u0026#39;) print(xb.shape) print(xb) print(\u0026#39;targets:\u0026#39;) print(yb.shape) print(yb) print(\u0026#39;----\u0026#39;) The output of above code is\ninputs: torch.Size([4, 8]) tensor([[24, 43, 58, 5, 57, 1, 46, 43], [44, 53, 56, 1, 58, 46, 39, 58], [52, 58, 1, 58, 46, 39, 58, 1], [25, 17, 27, 10, 0, 21, 1, 54]]) targets: torch.Size([4, 8]) tensor([[43, 58, 5, 57, 1, 46, 43, 39], [53, 56, 1, 58, 46, 39, 58, 1], [58, 1, 58, 46, 39, 58, 1, 46], [17, 27, 10, 0, 21, 1, 54, 39]]) Here we select 4 rows of 8 elements each. (4 = batch size, 8 = block size), in targets, we show the one-element forwarded input array, for example, when input is [24] the target: 43 when input is [24, 43] the target: 58\nWe will continue learning this, in the next part of the blog.\nFor complete code, you can check this notebook: gpt-dev.ipynb – Colab (google.com)\nFor a complete video you can check this: https://www.youtube.com/watch?v=kCc8FmEb1nY\n","permalink":"https://learncodecamp.net/build-gpt-from-scratch-p1/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eHello everyone! By now, you’ve likely heard of ChatGPT, the revolutionary AI system that has taken the world and the AI community by storm. This remarkable technology allows you to interact with an AI through text-based tasks.\u003c/p\u003e\n\u003ch4 id=\"the-technology-behind-chatgpt-transformers\"\u003eThe Technology Behind ChatGPT: Transformers\u003c/h4\u003e\n\u003cp\u003eThe neural network that powers ChatGPT is based on the Transformer architecture, introduced in the 2017 paper “Attention is All You Need.” GPT stands for “Generatively Pre-trained Transformer.” The Transformer architecture is a landmark development in AI that revolutionized the field, primarily in natural language processing (NLP). The Transformer architecture, initially designed for machine translation, became the backbone for numerous AI applications, including ChatGPT.\u003c/p\u003e","title":"Unveiling the Secrets Behind ChatGPT – Part 1"},{"content":"Introduction This blog post dives into the fascinating world of computer vision, exploring how we can teach machines to “see” using convolutional neural networks (CNNs). This post is based on a lecture from MIT’s 6.S191: Introduction to Deep Learning course.\nWhat Does it Mean to “See”? Before diving into the technical details, let’s define “vision”. It’s not simply about identifying objects in an image. True vision goes beyond object recognition to understand the relationships between objects, their movements, and their future trajectories. Think about how you intuitively anticipate a pedestrian crossing the street or a car changing lanes. Building machines with this level of visual understanding is the ultimate goal.\nComputer vision example\nHow Computers “See” Images Unlike humans, computers perceive images as arrays of numbers, with each number representing the brightness level of a pixel. Grayscale images use a single number per pixel, while color images use three numbers (RGB) per pixel. Understanding this numerical representation is crucial for understanding how CNNs process visual information.\nLet’s break down how an RGB image is represented as a three-dimensional array.\nGrayscale Image Representation First, let’s recap how a grayscale image is represented:\nA grayscale image consists of pixels, and each pixel represents a single value corresponding to its brightness or luminosity. This value can range from 0 (black) to 255 (white) in an 8-bit image. The image can be thought of as a two-dimensional array (or matrix), where each element in the array corresponds to the brightness value of a pixel. For example, a 100×100 pixel grayscale image can be represented as a 100×100 matrix. RGB Image Representation An RGB image is a bit more complex:\nAn RGB image consists of pixels, and each pixel represents three color values: Red, Green, and Blue. Each of these color values can also range from 0 to 255 in an 8-bit image. Instead of a single brightness value per pixel (as in grayscale), an RGB pixel is represented by three values. To represent this, we use a three-dimensional array:\nThe first two dimensions correspond to the height and width of the image, just like in a grayscale image. The third dimension corresponds to the color channels (Red, Green, and Blue). 5×5 RGB Image Matrix Let’s define some RGB values for illustration. Each inner list represents the RGB values of a pixel.\nimage = [ # Row 0 [[255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 255, 0], [255, 0, 255]], # Row 1 [[0, 255, 255], [128, 128, 128], [255, 128, 0], [0, 128, 255], [128, 0, 128]], # Row 2 [[255, 255, 255], [0, 0, 0], [128, 255, 0], [255, 0, 128], [0, 255, 128]], # Row 3 [[128, 0, 0], [0, 128, 0], [0, 0, 128], [128, 128, 0], [0, 128, 128]], # Row 4 [[64, 64, 64], [192, 192, 192], [64, 192, 64], [192, 64, 192], [64, 64, 192]], ] In the realm of computer vision, it’s crucial to understand two fundamental types of tasks: regression and classification.\nTasks in Computer Vision Regression: Outputs continuous values (e.g., predict the exact age of a person from a photograph. The output is a continuous value, like 25.7 years). Classification: Outputs discrete labels (e.g., identifying whether an image contains a cat or a dog). Convolution Neural Netrworks A fully connected neural network (FCNN) consists of multiple layers, including hidden layers, where each neuron in one layer is connected to every neuron in the previous and subsequent layers. When using an FCNN for image classification, we must flatten the 2D image into a 1D array, feeding each pixel into the network as an input. This approach has significant drawbacks:\nLoss of Geometric Structure: Flattening the image destroys the spatial relationships and local patterns inherent in the 2D structure. Important features related to pixel proximity are lost. Large Number of Parameters: Flattening a small 100×100 pixel image results in 10,000 input neurons. Connecting these to another layer with 10,000 neurons introduces 100 million parameters (10,000 x 10,000). This massive number of parameters is computationally inefficient and impractical for training and processing. Instead of connecting every neuron to all pixels in the input image, we connect neurons in a hidden layer to small patches of the input image. This method maintains the spatial structure and reduces the number of parameters.\nPatch-Based Connections: Each neuron in the hidden layer is connected to a specific patch of pixels in the input image. For example, a red patch in the input image connects to a single neuron in the next layer. This neuron only sees a small part of the image, not the entire image. Correlation of Pixels: Pixels within a small patch are likely to be correlated, as they are close to each other. This property leverages the natural structure of images. Sliding Patches: To cover the entire image, the patch is slid across the input image, connecting patches to corresponding neurons in the hidden layer. This creates a new layer of neurons while preserving the spatial relationships from the input image. Feature Detection: The ultimate goal is to detect visual patterns or features in the image. By connecting patches to hidden neurons and cleverly weighting these connections, neurons can specialize in detecting particular features within their patches. This approach retains the spatial structure of the image and allows neurons to detect relevant features, making it more efficient and effective for image processing tasks compared to a fully connected network.\nThe Power of Convolution: Extracting Meaningful Features The heart of a CNN lies in its convolutional layers. These layers use filters (small matrices of weights) that slide across the image, performing element-wise multiplications and summations. This process creates feature maps, highlighting areas where the filter detects specific patterns.\nLet’s see a simple example of classifying “X” shapes to illustrate how different filters can detect different features within an image, such as diagonal lines and intersections, ultimately enabling the network to recognize the overall “X” shape.\nBuilding Blocks of a CNN: Convolution, Non-linearity, and Pooling Convolution in the context of convolutional neural networks (CNNs) is a mathematical operation used to extract features from input data, such as images, by applying a filter or kernel across the input. Here’s a detailed breakdown:\nKernel/Filter: A small matrix (e.g., 3×3 or 5×5) with learnable parameters. The kernel slides over the input image to perform the convolution operation. Sliding/Striding: The kernel moves across the input image, typically from left to right and top to bottom. The step size of this movement is called the stride. For example, a stride of 1 means the kernel moves one pixel at a time, while a stride of 2 means it moves two pixels at a time. Convolution Operation: At each position, the kernel multiplies its values element-wise with the input pixels it overlaps, then sums these products to produce a single output value. This operation is repeated across the entire input image. Feature Map: The result of applying the convolution operation across the image is a feature map (or activation map), which highlights the presence of certain features detected by the kernel, such as edges, textures, or more complex patterns in deeper layers. Padding: Sometimes, the input image is padded with zeros around the border to control the spatial dimensions of the output feature map. Padding ensures that the kernel can process edge pixels and helps preserve the input dimensions. Non-linearity (ReLU): After convolution, a non-linear activation function, typically the Rectified Linear Unit (ReLU), is applied to introduce non-linearity into the model, enabling it to learn more complex patterns. Beyond convolution, a typical CNN architecture incorporates two additional key elements:\nNon-linearity: After each convolution, a non-linear function (like ReLU) is applied to the feature map. This introduces non-linearity into the model, enabling it to learn more complex patterns. Pooling: This down-sampling operation reduces the dimensionality of the feature maps, making the network more computationally efficient and increasing the receptive field of subsequent layers. In convolutional neural networks (CNNs), non-linearity and pooling operations are essential for enhancing feature extraction and reducing dimensionality.\nNon-linearity: Purpose: Introduced because real-world data is highly nonlinear. Common Activation Function: Rectified Linear Unit (ReLU). Function: Deactivates negative values in the feature map by setting them to zero, while passing positive values unchanged. This acts like a threshold function, enhancing model capacity to learn complex patterns. Pooling: Purpose: Reduces the dimensionality of the feature maps to manage computational complexity and prevent overfitting. Common Technique: Max pooling. Function: Divides the input feature map into smaller regions (e.g., 2×2 grids) and outputs the maximum value from each region, effectively reducing the size of the feature map by a factor of two. This retains the most significant features while reducing data volume. Alternative Methods: Average pooling, which outputs the average value of each region instead of the maximum, among other methods. Full Convolutional Layer:\nComposition: Convolution followed by non-linearity (e.g., ReLU) and pooling. Purpose: Each layer extracts progressively more complex features, building hierarchical representations from low-level edges to high-level object parts. CNN Structure for Image Classification:\nFeature Extraction Head: Composed of multiple convolutional layers to extract relevant features from the input image. Classification Head: After feature extraction, the features are fed into fully connected layers to perform classification. This step often uses a softmax function to produce a probability distribution over classes. Overall Workflow:\nInput image is processed through convolutional layers to extract features. Features are pooled to reduce dimensionality while retaining important information. Extracted features are fed into fully connected layers for classification. The softmax function ensures the output probabilities sum to one, suitable for categorization tasks. This process enables CNNs to efficiently learn and classify visual patterns by leveraging hierarchical feature extraction and dimensionality reduction.\nConvolutional Neural Networks Architecture: A Two-Part Symphony A CNN for image classification can be viewed as a two-part system:\nFeature Extraction: This part consists of stacked convolutional layers interspersed with non-linearity and pooling layers. It learns hierarchical representations of the image, going from simple edges and textures in early layers to complex object parts in later layers. Classification: The final layers of the network are typically fully connected, taking the learned features as input and outputting a probability distribution over the possible classes. Beyond Classification: The Versatility of CNN The real beauty of CNNs lies in their modularity. By swapping out the classification head with different types of layers, we can adapt CNNs for a wide range of computer vision tasks:\nObject Detection: Instead of just classifying the entire image, object detection involves identifying the locations of multiple objects within an image and drawing bounding boxes around them. Techniques like Faster R-CNN that use region proposal networks to efficiently identify potential object locations. Semantic Segmentation: This task takes object detection a step further by classifying every single pixel in the image. The output is a segmented image where each pixel is labeled with its corresponding class (e.g., road, sky, car). Self-Driving Cars: CNNs are crucial for enabling autonomous driving. They can process camera images to detect lanes, traffic signs, pedestrians, and other vehicles, providing essential information for navigation. The lecture showcases an impressive example of a CNN-powered car navigating a new environment without any prior knowledge of the route. Conclusion In this blog we learned about CNNs, demystifying their inner workings and showcasing their impressive capabilities in solving various computer vision tasks. From medical diagnosis to self-driving cars, CNNs are revolutionizing numerous fields. As research progresses, we can expect even more innovative applications of these powerful deep learning models in the future.\nFor a complete video check this video or visit the website\n","permalink":"https://learncodecamp.net/convolutional-neural-networks/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eThis blog post dives into the fascinating world of computer vision, exploring how we can teach machines to “see” using convolutional neural networks (CNNs). This post is based on a lecture from MIT’s 6.S191: Introduction to Deep Learning course.\u003c/p\u003e\n\u003ch3 id=\"what-does-it-mean-to-see\"\u003eWhat Does it Mean to “See”?\u003c/h3\u003e\n\u003cp\u003eBefore diving into the technical details, let’s define “vision”. It’s not simply about identifying objects in an image. True vision goes beyond object recognition to understand the relationships between objects, their movements, and their future trajectories. Think about how you intuitively anticipate a pedestrian crossing the street or a car changing lanes. Building machines with this level of visual understanding is the ultimate goal.\u003cfigure\u003e\u003c/p\u003e","title":"Convolutional Neural Networks"},{"content":"Introduction In the world of software development, the ability to build and deploy applications across different architectures is invaluable. This capability becomes particularly essential when dealing with ARM-based applications such as those for embedded systems or newer ARM-based servers. In this blog post, we will explore how to build ARM64 Docker container images on an x86_64 machine using QEMU emulation and Docker’s buildx tool.\nUnderstanding the Challenge The main challenge in building Docker images for a different architecture than your host machine lies in the architecture-specific binaries and dependencies. Directly running ARM binaries on an x86_64 machine is not possible without emulation due to differences in architecture instruction sets.\nIntroducing QEMU and binfmt_misc To address this challenge, we use QEMU, a generic and open source machine emulator and virtualizer. When used as an emulator, QEMU can run operating systems and applications made for one machine (e.g., an ARM64 server) on a different machine (e.g., your x86_64 PC). By leveraging QEMU, developers can emulate ARM64 architectures on their x86 machines, facilitating cross-compilation and testing of ARM-specific applications.\nWhat is binfmt_misc? binfmt_misc is a capability of the Linux kernel that allows the kernel to recognize and parse various binary formats transparently to the user. It’s used to tell the kernel which program (in our case, QEMU) to invoke when executing binaries from foreign architectures. Essentially, binfmt_misc allows users to run executables for any architecture, as long as the appropriate emulator is installed on the system.\nStep-by-Step Guide to Building ARM64 Docker Images Step 1: Install Docker and Docker Buildx Ensure Docker is installed on your machine along with the buildx plugin, which supports building multi-architecture images:\ndocker buildx version If not present, set up a new buildx builder:\ndocker buildx create --name mybuilder docker buildx use mybuilder docker buildx inspect --bootstrap Step 2: Set Up QEMU Emulation Install the QEMU packages to add emulation capabilities for different architectures:\ndocker run --rm --privileged multiarch/qemu-user-static --reset -p yes This command prepares Docker to emulate different architectures including ARM64.\nStep 3: Prepare Your Dockerfile Create a Dockerfile that will form the basis of your ARM64 container. Here’s a sample Dockerfile:\n# Use an official ARM64 Python base image FROM arm64v8/python:3.9-slim # Install make, g++ and other dependencies RUN apt-get update \u0026amp;\u0026amp; apt-get install -y \\ make \\ g++ \\ \u0026amp;\u0026amp; rm -rf /var/lib/apt/lists/* WORKDIR /app COPY . /app CMD [\u0026#34;python3\u0026#34;, \u0026#34;your_script.py\u0026#34;] Step 4: Build the Docker Image with Buildx With everything set up, build your Docker image for ARM64:\ndocker buildx build --platform linux/arm64 -t yourname/yourimage:tag . To check whether binfmt_misc support is enabled on your Linux system and to verify the associated entries in the /proc filesystem, you can follow these steps. The /proc filesystem is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure.\nCheck the /proc filesystem After ensuring that the binfmt_misc module is loaded, proceed to check the /proc/sys/fs/binfmt_misc/ directory. This is where the binfmt_misc entries are located:\nls /proc/sys/fs/binfmt_misc/ You should see several files including status, and possibly other entries if any specific binary formats have been registered. The status file indicates whether binfmt_misc is enabled. You can check its content with:\ncat /proc/sys/fs/binfmt_misc/status Step 3: Check registered binary formats To see detailed information about each registered binary format, you can cat the specific files in the binfmt_misc directory. Each registered binary format has its own file, named after the identifier of the binary format. For example:\ncat /proc/sys/fs/binfmt_misc/qemu-arm This command would show the details for the ARM binary format if QEMU has registered it under that name. The output will include information like the type of binaries it supports, the interpreter (QEMU in this case), and any flags set.\nStep 4: Test a binary execution (optional) If you want to test whether the binary execution through binfmt_misc is working, you can try running a simple binary from the foreign architecture that you have set up through QEMU. For instance, running an ARM binary on an x86_64 machine where QEMU and binfmt_misc are configured to handle ARM binaries.\nChallenges Cross-compiling using qemu can be slow, it can take a huge amount of time to build other platform images on a different architecture host machine. So this might not be a practical solution, but if you use caching with docker images, several steps can be cached, and it can reduce the build time significantly.\nTo read mode about docker buildx, check official documentation : docker buildx | Docker Docs\nConclusion Using QEMU and Docker buildx to build ARM64 images on an x86_64 host is a powerful solution for developers looking to ensure their applications are cross-platform compatible. This approach not only facilitates development and testing but also ensures that your deployment process is streamlined and efficient. By understanding and utilizing technologies like QEMU and binfmt_misc, developers can seamlessly develop for and deploy to a variety of architectures from a single machine.\n","permalink":"https://learncodecamp.net/docker-buildx-platform/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn the world of software development, the ability to build and deploy applications across different architectures is invaluable. This capability becomes particularly essential when dealing with ARM-based applications such as those for embedded systems or newer ARM-based servers. In this blog post, we will explore how to build ARM64 Docker container images on an x86_64 machine using QEMU emulation and Docker’s buildx tool.\u003c/p\u003e\n\u003ch3 id=\"understanding-the-challenge\"\u003eUnderstanding the Challenge\u003c/h3\u003e\n\u003cp\u003eThe main challenge in building Docker images for a different architecture than your host machine lies in the architecture-specific binaries and dependencies. Directly running ARM binaries on an x86_64 machine is not possible without emulation due to differences in architecture instruction sets.\u003c/p\u003e","title":"Building ARM64 Docker Images on x86_64 Machines Using QEMU and Docker Buildx"},{"content":"Introduction Into to deep learning\nIntelligence: The ability to process information and use it for future decision-making.\nArtificial Intelligence (AI): Empowering computers with the ability to process information and make decisions.\nMachine Learning (ML): A subset of AI focused on teaching computers to learn from data.\nDeep Learning (DL): A subset of ML utilizing neural networks to process raw data and inform decisions.\nWhy Deep Learning Now? The recent surge in deep learning’s capabilities can be attributed to three key factors:\nData Explosion: Deep learning models thrive on data, and we’re currently experiencing an unprecedented level of data generation. Hardware Advancements: GPUs offer the parallel processing power necessary for deep learning algorithms. Open-Source Software: Tools like TensorFlow and PyTorch have streamlined the development and deployment of deep learning models. The Perceptron: A Neural Network Building Block The fundamental building block of a neural network is the perceptron, a single neuron that processes information through three steps:\nDot Product: Multiplying inputs with corresponding weights. Bias Addition: Adding a bias term to shift the activation function. Nonlinear Activation: Passing the result through a nonlinear function like sigmoid or ReLU. Common Activation Functions Sigmoid: This function squashes any input value to a range between 0 and 1, making it suitable for modeling probabilities. However, it suffers from the vanishing gradient problem, where gradients become very small during backpropagation, hindering learning in early layers of deep networks. ReLU (Rectified Linear Unit): This simple function outputs the input directly if it’s positive; otherwise, it outputs zero. ReLU is computationally efficient and doesn’t suffer from the vanishing gradient problem as much as sigmoid. However, it can suffer from the “dying ReLU” problem, where neurons get stuck in a state where they always output zero. Tanh (Hyperbolic Tangent): Tanh is similar to sigmoid but outputs values between -1 and 1. It’s often preferred over sigmoid as it centers the data around zero, which can aid in learning. However, it still suffers from the vanishing gradient problem. Why Nonlinearity is Essential Real-World Data is Nonlinear: Real-world phenomena are rarely linear. Imagine trying to classify images of cats and dogs – a linear function wouldn’t be able to capture the complex patterns and features that distinguish the two animals. Nonlinear activation functions allow neural networks to learn these intricate relationships within the data. Expressive Power: Nonlinearity enables neural networks to learn more complex functions. Stacking multiple linear layers would still result in a linear model, limiting the network’s ability to handle intricate patterns. Nonlinear activations provide the necessary expressiveness for tackling challenging tasks. Decision Boundaries: Nonlinear activations enable the creation of non-linear decision boundaries, which are crucial for classification tasks. For example, a linear function could only separate data points with a straight line, whereas a nonlinear function can learn curves and complex shapes to accurately separate different classes. In essence, nonlinear activation functions are the key to unlocking the power of deep learning. They allow neural networks to learn the complex representations and relationships present in real-world data, enabling them to perform a wide range of tasks with remarkable accuracy.\nBuilding Neural Networks Multiple perceptrons connected in layers form a neural network. Deep learning involves stacking these layers, creating a hierarchy that allows for complex data processing.\nChallenges in Training Neural Networks Training neural networks comes with challenges:\nComputational Cost: Backpropagation, the algorithm for computing gradients, can be computationally expensive. Local Minima: Gradient descent can get stuck in local minima instead of reaching the global minimum. Overfitting: Models may learn the training data too well and fail to generalize to new data. Techniques like stochastic gradient descent (SGD), adaptive learning rates, dropout, and early stopping help address these challenges.\nGradient Descent and Stochastic Gradient Descent: Navigating the Loss Landscape The journey of training a neural network revolves around finding the optimal set of weights that minimize the error between its predictions and the actual target values. This is where gradient descent comes into play, acting as a guide through the complex terrain of the loss landscape.\nUnderstanding the Loss Landscape:\nImagine a mountainous landscape where each point represents a specific configuration of weights for the neural network, and the height at each point signifies the corresponding loss (error) value. Our goal is to reach the lowest point in this landscape, representing the set of weights that yield the minimum loss.\nGradient Descent: Taking Small Steps Downward:\nGradient descent is an iterative optimization algorithm that helps us navigate this loss landscape. It works by taking small steps in the direction of the steepest descent, gradually approaching the minimum loss. Here’s how it operates:\nInitialization: We start at a random point in the loss landscape (a random set of weights). Gradient Calculation: We compute the gradient at our current location, which indicates the direction of the steepest ascent. Step in the Opposite Direction: We take a small step in the opposite direction of the gradient, effectively moving towards a lower point in the landscape. Repeat: We iterate steps 2 and 3 until we reach a point where further steps no longer significantly decrease the loss. The Learning Rate: Controlling the Step Size: The size of the steps taken in gradient descent is determined by the learning rate. Setting the learning rate is crucial:\nToo small: The model may take too long to converge or get stuck in local minima. Too large: The model may overshoot the minimum and diverge. Stochastic Gradient Descent: Efficiency through Approximation:\nTraditional gradient descent calculates the gradient using the entire training dataset, which can be computationally expensive for large datasets. Stochastic Gradient Descent (SGD) offers a more efficient approach by approximating the gradient using a single data point or a small batch of data points (mini-batch).\nThis introduces some noise into the process but allows for faster iterations and often leads to quicker convergence. The trade-off between accuracy and efficiency is a key consideration when choosing between gradient descent and SGD.\nKey Takeaways:\nGradient descent and SGD are optimization algorithms that guide the training process of neural networks by iteratively adjusting weights to minimize the loss. The learning rate controls the step size in gradient descent and needs to be carefully tuned for optimal performance. SGD offers computational efficiency by approximating the gradient using smaller batches of data, but at the cost of introducing noise. In essence, gradient descent and its variants are the driving force behind training neural networks, allowing them to learn from data and achieve remarkable results in various AI applications.\nUnderstanding Overfitting Overfitting results in models that perform exceptionally well on the training data but fail to generalize to new, unseen examples. Imagine a student who memorizes every answer in a textbook but struggles to apply that knowledge to real-world problems. That’s essentially what happens with an overfitted model.\nRegularization to the Rescue To combat overfitting, we turn to regularization techniques. These techniques aim to discourage the model from learning the idiosyncrasies of the training data and instead encourage it to learn more generalizable patterns.\nTwo Key Regularization Techniques:\nDropout: During training, dropout randomly sets a portion of neuron activations to zero. This forces the network to learn redundant representations, as it cannot rely on the presence of specific neurons. By preventing reliance on any single feature, dropout improves the model’s ability to generalize. Early Stopping: This technique involves monitoring the model’s performance on both the training data and a separate validation set during training. When the validation loss starts to increase, it indicates that the model is beginning to overfit. Early stopping simply halts the training process at this point, preventing further memorization of the training data and preserving the model’s ability to generalize. Why Regularization Matters\nRegularization plays a crucial role in ensuring that deep learning models are not just good at memorizing training data but can effectively handle real-world scenarios and new data points. By preventing overfitting, regularization techniques help us build robust and reliable AI systems.\nReview Video For complete course check this link\n","permalink":"https://learncodecamp.net/learning-from-introduction-to-deep-learning/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003cfigure\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"535\" src=\"/wp-content/uploads/2024/05/image-1024x535.png\" alt=\"\" /\u003e \u003cfigcaption class=\"wp-element-caption\"\u003eInto to deep learning\u003c/figcaption\u003e\u003c/figure\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIntelligence:\u003c/strong\u003e The ability to process information and use it for future decision-making.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eArtificial Intelligence (AI):\u003c/strong\u003e Empowering computers with the ability to process information and make decisions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMachine Learning (ML):\u003c/strong\u003e A subset of AI focused on teaching computers to learn from data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeep Learning (DL):\u003c/strong\u003e A subset of ML utilizing neural networks to process raw data and inform decisions.\u003c/p\u003e\n\u003ch3 id=\"why-deep-learning-now\"\u003eWhy Deep Learning Now?\u003c/h3\u003e\n\u003cp\u003eThe recent surge in deep learning’s capabilities can be attributed to three key factors:\u003c/p\u003e","title":"Learning from Introduction to Deep Learning"},{"content":"The Busy Person’s Guide to Large Language Models: From Inner Workings to Future Possibilities (and Security Concerns)\nThis post explores the fascinating world of large language models (LLMs) like ChatGPT and llama2, diving into their inner workings, potential future developments, and even the security challenges they present. It’s a summary of a talk by Andrej Karpathy, offering a comprehensive overview for anyone curious about this rapidly evolving technology.\nWhat are LLMs and How Do They Work? Imagine a massive file containing compressed knowledge from the internet – that’s essentially what an LLM is. It’s a complex neural network trained on vast amounts of text data, enabling it to predict and generate human-like text. The process involves two key stages:\nPre-training: This expensive and computationally intensive stage involves feeding the model enormous amounts of text data (think terabytes!) to build its knowledge base. Fine-tuning: This stage focuses on shaping the model’s behavior to perform specific tasks, like answering questions or generating different creative text formats. This is achieved by training it on curated datasets of question-answer pairs or other relevant examples. Understanding the “Black Box” While we understand the architecture of LLMs, the exact inner workings of their billions of parameters remain largely mysterious. We know they excel at predicting the next word in a sequence, which surprisingly translates to a broad understanding of the world and various topics. However, their knowledge can be “one-dimensional” and prone to biases present in the training data.\nEvolution of LLMs: Beyond Text Generation Modern LLMs are evolving beyond simple text generation to become more versatile and powerful:\nTool Use: LLMs can now interact with external tools like calculators, code interpreters, and even image generators like DALL-E, allowing them to perform complex tasks that require information retrieval, computation, and creative generation. Multimodality: Recent advancements enable LLMs to process and generate not just text but also images and audio. This opens doors to exciting applications like image captioning, voice assistants, and even generating music. Future Directions: Towards More Sophisticated Thinking The future of LLMs is full of possibilities:\nSystem 2 Thinking: Current LLMs operate in a “System 1” mode, relying on quick, instinctive responses. Researchers are exploring ways to enable “System 2” thinking, allowing for more deliberate, rational, and complex problem-solving. Self-improvement: Inspired by AlphaGo’s success, the field is investigating how LLMs could self-improve beyond human-provided training data, potentially unlocking even greater capabilities. Customization: Imagine a future with personalized LLMs tailored to specific tasks and domains. Platforms like the GPT App Store are paving the way for such customization options, allowing users to personalize their AI assistants. The LLM Operating System: A New Computing Paradigm Thinking of LLMs as the kernel of a new operating system is helpful. This “LLM OS” could coordinate various tools and resources, enabling users to interact with computers using natural language, making technology more accessible and user-friendly.\nSecurity Concerns: A Growing Challenge With great power comes great responsibility. Just as traditional operating systems face security threats, so do LLMs:\nJailbreaks: Malicious actors can exploit vulnerabilities to bypass safety measures and make LLMs generate harmful content. Prompt Injection: Attackers can inject hidden instructions into prompts, manipulating the model’s output for malicious purposes. Data Poisoning: By poisoning training data, attackers can create “backdoors” or trigger phrases that can later be used to manipulate the model’s behavior. These are just a few examples of the cat-and-mouse game between attackers and defenders in the LLM security landscape. As the technology matures, ensuring the safety and security of these powerful tools will be crucial.\nFor more info, check this : Universal and Transferable Attacks on Aligned Language Models (llm-attacks.org)\nThe Road Ahead LLMs represent a significant leap forward in AI and have the potential to revolutionize how we interact with computers and information. However, addressing the ethical and security concerns is essential to ensure responsible development and a positive impact on society. The future of LLMs is bright, but it’s important to proceed with caution and careful consideration.\nYoutube video :\n","permalink":"https://learncodecamp.net/intro-to-large-language-model/","summary":"\u003cp\u003e\u003cstrong\u003eThe Busy Person’s Guide to Large Language Models: From Inner Workings to Future Possibilities (and Security Concerns)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis post explores the fascinating world of large language models (LLMs) like ChatGPT and llama2, diving into their inner workings, potential future developments, and even the security challenges they present. It’s a summary of a talk by Andrej Karpathy, offering a comprehensive overview for anyone curious about this rapidly evolving technology.\u003c/p\u003e\n\u003ch3 id=\"what-are-llms-and-how-do-they-work\"\u003eWhat are LLMs and How Do They Work?\u003c/h3\u003e\n\u003cp\u003eImagine a massive file containing compressed knowledge from the internet – that’s essentially what an LLM is. It’s a complex neural network trained on vast amounts of text data, enabling it to predict and generate human-like text. The process involves two key stages:\u003c/p\u003e","title":"Intro to Large Language Models"},{"content":"Introduction Handling asynchronous operations efficiently is crucial in modern web development, especially when dealing with APIs, data fetching, and timed operations. TypeScript, a superset of JavaScript, enhances JavaScript’s capabilities, including its handling of async operations. In this article, we explore various asynchronous patterns in TypeScript, including a custom sleep function, fetching data from APIs, and handling multiple async operations simultaneously.\n1. Implementing a Sleep Function JavaScript and TypeScript do not have a built-in sleep function. However, you can simulate this behavior using Promises combined with the setTimeout function. Here’s how you can implement a simple sleep function in TypeScript:\nfunction sleep(ms: number): Promisevoid\u0026gt; { return new Promise(resolve =\u0026gt; setTimeout(resolve, ms)); } async function demoSleep() { console.log(\u0026#39;Wait for 3 seconds...\u0026#39;); await sleep(3000); // Sleep for 3 seconds console.log(\u0026#39;Done waiting!\u0026#39;); } demoSleep(); In this example:\nThe sleep function creates a new Promise that resolves after a specified number of milliseconds. The demoSleep async function demonstrates how to use the sleep function by pausing execution for three seconds before continuing. 2. Fetching Data from an API Data fetching is a common task in web applications. Here’s an example of how to fetch data from a REST API using async/await syntax in TypeScript:\nasync function fetchData(url: string): Promiseany\u0026gt; { try { const response = await fetch(url); if (!response.ok) { throw new Error(`HTTP error! Status: ${response.status}`); } const data = await response.json(); return data; } catch (error) { console.error(\u0026#34;Failed to fetch data:\u0026#34;, error); return null; } } fetchData(\u0026#39;https://api.example.com/data\u0026#39;) .then(data =\u0026gt; console.log(data)) .catch(error =\u0026gt; console.error(error)); 3. Handling Multiple Asynchronous Operations When you need to handle multiple asynchronous operations and wait for all to complete before proceeding, Promise.all() comes in handy:\nasync function fetchMultipleData(urls: string[]): Promiseany[]\u0026gt; { try { const promises = urls.map(url =\u0026gt; fetch(url).then(r =\u0026gt; r.json())); const results = await Promise.all(promises); return results; } catch (error) { console.error(\u0026#34;An error occurred:\u0026#34;, error); return []; } } const urls = [ \u0026#39;https://api.example.com/data1\u0026#39;, \u0026#39;https://api.example.com/data2\u0026#39;, \u0026#39;https://api.example.com/data3\u0026#39; ]; fetchMultipleData(urls) .then(results =\u0026gt; console.log(results)) .catch(error =\u0026gt; console.error(error)); 4. Error Handling in Asynchronous Functions Proper error handling is vital in asynchronous operations to ensure your application remains robust and user-friendly. TypeScript allows structured error handling using try/catch blocks within async functions:\nasync function secureFetchData(url: string): Promiseany\u0026gt; { try { const response = await fetch(url); const data = await response.json(); return data; } catch (error) { console.error(\u0026#34;Error fetching data:\u0026#34;, error); throw error; // Rethrow or handle as needed } } Using try/catch ensures that your application can gracefully handle errors, such as network issues or invalid responses.\nConclusion Asynchronous programming is a powerful part of TypeScript, enabling developers to handle tasks such as API requests, timers, and simultaneously running processes efficiently. By mastering these patterns, you can greatly improve the responsiveness and reliability of your applications. Whether you are implementing a simple delay with a sleep function, fetching data from a remote server, or handling multiple async operations, TypeScript provides the tools necessary to do this with ease and elegance.\n","permalink":"https://learncodecamp.net/mastering-asynchronous-operations-in-typescript/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eHandling asynchronous operations efficiently is crucial in modern web development, especially when dealing with APIs, data fetching, and timed operations. TypeScript, a superset of JavaScript, enhances JavaScript’s capabilities, including its handling of async operations. In this article, we explore various asynchronous patterns in TypeScript, including a custom sleep function, fetching data from APIs, and handling multiple async operations simultaneously.\u003c/p\u003e\n\u003ch3 id=\"1-implementing-a-sleep-function\"\u003e1. Implementing a Sleep Function\u003c/h3\u003e\n\u003cp\u003eJavaScript and TypeScript do not have a built-in sleep function. However, you can simulate this behavior using Promises combined with the \u003ccode\u003esetTimeout\u003c/code\u003e function. Here’s how you can implement a simple sleep function in TypeScript:\u003c/p\u003e","title":"Mastering Asynchronous Operations in TypeScript"},{"content":"Introduction In today’s data-driven landscape, businesses require robust tools to efficiently process, transform, and analyze data for deriving meaningful insights. Google Dataflow is a solution, offering a powerful, fully managed service on the Google Cloud Platform (GCP) that simplifies the complexities of building data pipelines.\nKey Features of Google Dataflow Google Dataflow boasts several key features that make it indispensable for modern data processing needs:\nUnified Model for Batch and Stream Processing: Dataflow leverages the Apache Beam SDK, providing a unified approach to code both batch and streaming data pipelines. This eliminates the need for maintaining separate systems and skillsets for different processing types. Serverless and Auto-scaling: As a fully managed service, Dataflow handles infrastructure management seamlessly, automatically scaling resources based on workload to ensure cost-efficiency. Focus on Logic, not Infrastructure: Dataflow allows users to concentrate on the core logic of data transformation while handling distributed processing, fault tolerance, and resource provisioning intricacies. Rich Integration with GCP: Deep integration with other GCP services such as Cloud Pub/Sub, BigQuery, and Cloud Storage enables users to develop end-to-end data engineering solutions within the Google Cloud ecosystem. Real-Time Analytics with Pub/Sub to BigQuery Pipelines One of the most compelling use cases of Google Dataflow is its capability to build real-time data ingestion and analysis pipelines using Cloud Pub/Sub and BigQuery. Below, are outlined the steps involved in constructing such pipelines.\nData Ingestion with Cloud Pub/Sub: Cloud Pub/Sub serves as a highly scalable messaging service allowing seamless decoupling of data producers and consumers. Applications publish messages to Pub/Sub topics, and interested subscribers consume them asynchronously. Real-Time Processing with Dataflow A Dataflow pipeline subscribes to a Pub/Sub topic, processing incoming messages in real-time. Key processing steps include data cleaning and validation, transformation and enrichment, as well as windowing and aggregation for calculating metrics over specified time windows. Loading Data into BigQuery BigQuery, a serverless data warehouse, facilitates efficient storage and querying of processed data. Dataflow seamlessly writes processed data into BigQuery tables, supporting both streaming and batch modes. Analytics and Visualization BigQuery’s SQL-like interface empowers users to perform complex queries for in-depth analysis. Tools like Google Data Studio or Looker can be connected to BigQuery to create interactive dashboards and reports for visualization. Building a Pub/Sub to BigQuery Pipeline Google Cloud provides pre-built Dataflow templates that streamline the process of creating such pipelines. Follow these steps to create a pipeline:\nAccess the Dataflow console in GCP. Select the “Pub/Sub Subscription to BigQuery” template. Configure parameters including Pub/Sub input subscription, BigQuery output table, and temporary storage location. Launch the job to initiate the pipeline execution. Writing custom code with Apache Bean SDK for batch processing # Import necessary libraries and modules import apache_beam as beam import os from apache_beam.options.pipeline_options import PipelineOptions # Define pipeline options pipeline_options = { \u0026#39;project\u0026#39;: \u0026#39;dataflow-course-319517\u0026#39; , \u0026#39;runner\u0026#39;: \u0026#39;DataflowRunner\u0026#39;, \u0026#39;region\u0026#39;: \u0026#39;southamerica-east1\u0026#39;, \u0026#39;staging_location\u0026#39;: \u0026#39;gs://dataflow-course/temp\u0026#39;, \u0026#39;temp_location\u0026#39;: \u0026#39;gs://dataflow-course/temp\u0026#39;, \u0026#39;template_location\u0026#39;: \u0026#39;gs://dataflow-course/template/batch_job_df_bq_flights\u0026#39; , \u0026#39;save_main_session\u0026#39;: True } # Create pipeline with defined options pipeline_options = PipelineOptions.from_dictionary(pipeline_options) p1 = beam.Pipeline(options=pipeline_options) # Set service account credentials serviceAccount = r\u0026#34;C:\\Users\\cassi\\Google Drive\\GCP\\Dataflow Course\\Meu_Curso_EN\\dataflow-course-319517-4f98a2ce48a7.json\u0026#34; os.environ[\u0026#34;GOOGLE_APPLICATION_CREDENTIALS\u0026#34;]= serviceAccount # Define DoFn classes for data processing class split_lines(beam.DoFn): \u0026#34;\u0026#34;\u0026#34;Splits each line of input record.\u0026#34;\u0026#34;\u0026#34; def process(self, record): return [record.split(\u0026#39;,\u0026#39;)] class Filter(beam.DoFn): \u0026#34;\u0026#34;\u0026#34;Filters records based on a condition.\u0026#34;\u0026#34;\u0026#34; def process(self, record): if int(record[8]) \u0026gt; 0: return [record] # Define functions for data transformation def dict_level1(record): \u0026#34;\u0026#34;\u0026#34;Creates level-1 dictionary.\u0026#34;\u0026#34;\u0026#34; dict_ = {} dict_[\u0026#39;airport\u0026#39;] = record[0] dict_[\u0026#39;list\u0026#39;] = record[1] return(dict_) def unnest_dict(record): \u0026#34;\u0026#34;\u0026#34;Unnests nested dictionaries.\u0026#34;\u0026#34;\u0026#34; def expand(key, value): if isinstance(value, dict): return [(key + \u0026#39;_\u0026#39; + k, v) for k, v in unnest_dict(value).items()] else: return [(key, value)] items = [item for k, v in record.items() for item in expand(k, v)] return dict(items) def dict_level0(record): \u0026#34;\u0026#34;\u0026#34;Creates level-0 dictionary.\u0026#34;\u0026#34;\u0026#34; dict_ = {} dict_[\u0026#39;airport\u0026#39;] = record[\u0026#39;airport\u0026#39;] dict_[\u0026#39;list_Delayed_num\u0026#39;] = record[\u0026#39;list_Delayed_num\u0026#39;][0] dict_[\u0026#39;list_Delayed_time\u0026#39;] = record[\u0026#39;list_Delayed_time\u0026#39;][0] return(dict_) # Define table schema for BigQuery table_schema = \u0026#39;airport:STRING, list_Delayed_num:INTEGER, list_Delayed_time:INTEGER\u0026#39; table = \u0026#39;dataflow-course-319517:flights_dataflow.flights_aggr\u0026#39; # Define pipeline steps for Delayed_time and Delayed_num Delayed_time = ( p1 | \u0026#34;Import Data time\u0026#34; \u0026gt;\u0026gt; beam.io.ReadFromText(r\u0026#34;gs://dataflow-course/input/flights_sample.csv\u0026#34;, skip_header_lines=1) | \u0026#34;Split by comma time\u0026#34; \u0026gt;\u0026gt; beam.ParDo(split_lines()) | \u0026#34;Filter Delays time\u0026#34; \u0026gt;\u0026gt; beam.ParDo(Filter()) | \u0026#34;Create a key-value time\u0026#34; \u0026gt;\u0026gt; beam.Map(lambda record: (record[4], int(record[8]))) | \u0026#34;Sum by key time\u0026#34; \u0026gt;\u0026gt; beam.CombinePerKey(sum) ) Delayed_num = ( p1 | \u0026#34;Import Data\u0026#34; \u0026gt;\u0026gt; beam.io.ReadFromText(r\u0026#34;gs://dataflow-course/input/flights_sample.csv\u0026#34;, skip_header_lines=1) | \u0026#34;Split by comma\u0026#34; \u0026gt;\u0026gt; beam.ParDo(split_lines()) | \u0026#34;Filter Delays\u0026#34; \u0026gt;\u0026gt; beam.ParDo(Filter()) | \u0026#34;Create a key-value\u0026#34; \u0026gt;\u0026gt; beam.Map(lambda record: (record[4], int(record[8]))) | \u0026#34;Count by key\u0026#34; \u0026gt;\u0026gt; beam.combiners.Count.PerKey() ) # Define pipeline steps for creating the Delay_table Delay_table = ( {\u0026#39;Delayed_num\u0026#39;: Delayed_num, \u0026#39;Delayed_time\u0026#39;: Delayed_time} | \u0026#34;Group By\u0026#34; \u0026gt;\u0026gt; beam.CoGroupByKey() | \u0026#34;Unnest 1\u0026#34; \u0026gt;\u0026gt; beam.Map(lambda record: dict_level1(record)) | \u0026#34;Unnest 2\u0026#34; \u0026gt;\u0026gt; beam.Map(lambda record: unnest_dict(record)) | \u0026#34;Unnest 3\u0026#34; \u0026gt;\u0026gt; beam.Map(lambda record: dict_level0(record)) | \u0026#34;Write to BQ\u0026#34; \u0026gt;\u0026gt; beam.io.WriteToBigQuery( table, schema=table_schema, write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, custom_gcs_temp_location=\u0026#39;gs://dataflow-course/temp\u0026#39; ) ) # Execute the pipeline p1.run() Code references: https://www.udemy.com/course/data-engineering-with-google-dataflow-and-apache-beam/\nConculsion Google Dataflow emerges as a versatile platform for stream and batch data processing, offering seamless integration with other GCP services and enabling users to focus on data logic rather than infrastructure management.\n","permalink":"https://learncodecamp.net/google-dataflow-pub-sub-biq-query/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn today’s data-driven landscape, businesses require robust tools to efficiently process, transform, and analyze data for deriving meaningful insights. Google Dataflow is a solution, offering a powerful, fully managed service on the Google Cloud Platform (GCP) that simplifies the complexities of building data pipelines.\u003c/p\u003e\n\u003ch3 id=\"key-features-of-google-dataflow\"\u003eKey Features of Google Dataflow\u003c/h3\u003e\n\u003cp\u003eGoogle Dataflow boasts several key features that make it indispensable for modern data processing needs:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eUnified Model for Batch and Stream Processing\u003c/strong\u003e: Dataflow leverages the \u003cstrong\u003eApache Beam\u003c/strong\u003e SDK, providing a unified approach to code both batch and streaming data pipelines. This eliminates the need for maintaining separate systems and skillsets for different processing types.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eServerless and Auto-scaling\u003c/strong\u003e: As a fully managed service, Dataflow handles infrastructure management seamlessly, automatically scaling resources based on workload to ensure cost-efficiency.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eFocus on Logic, not Infrastructure\u003c/strong\u003e: Dataflow allows users to concentrate on the core logic of data transformation while handling distributed processing, fault tolerance, and resource provisioning intricacies.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRich Integration with GCP\u003c/strong\u003e: Deep integration with other GCP services such as Cloud Pub/Sub, BigQuery, and Cloud Storage enables users to develop end-to-end data engineering solutions within the Google Cloud ecosystem.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"real-time-analytics-with-pubsub-to-bigquery-pipelines\"\u003eReal-Time Analytics with Pub/Sub to BigQuery Pipelines\u003c/h3\u003e\n\u003cp\u003eOne of the most compelling use cases of Google Dataflow is its capability to build real-time data ingestion and analysis pipelines using Cloud Pub/Sub and BigQuery. Below, are outlined the steps involved in constructing such pipelines.\u003c/p\u003e","title":"Streamlining Data Processing with Google Dataflow"},{"content":"Introduction The Heart of Your TypeScript Project: A tsconfig.json file acts as the central configuration hub for your TypeScript project. It tells the TypeScript compiler (TSC) how to transform your TypeScript code into usable JavaScript. Root Signal: The presence of a tsconfig.json file signifies that the directory it lives in is the root of your TypeScript project. Why use a tsconfig.json file? Customization: It allows you to tailor the TypeScript compiler’s behavior to match your project’s specific needs and preferences. Consistency: It ensures that all developers working on the project use the same compiler settings, leading to a more consistent codebase. Efficiency: You can avoid long and repetitive command-line flags when using the tsc (TypeScript compiler) command. The tsconfig.json stores your settings. Key Sections within tsconfig.json Let’s focus on the most important section for understanding how to configure your project:\ncompilerOptions: This is where you’ll find the bulk of the settings that control the TypeScript compiler’s behavior. Some of the most important include: target: Specifies the JavaScript language version you want your TypeScript code compiled into (e.g., “ES5”, “ES2020”). module: Determines the module system your compiled code will use (e.g., “CommonJS”, “AMD”, “ES2015”). outDir: Specifies the directory where compiled JavaScript files will be placed. rootDir: Specifies the root directory for input source files. strict: Enables stricter type checking for a more robust codebase. sourceMap: Generates .map files to help with debugging the compiled JavaScript code. include: An array of file paths or glob patterns to specify the TypeScript files to include in the compilation process. exclude: An array of file paths or glob patterns to specify files that should be excluded from the compilation process. Sameple tsconfig.json file { \u0026#34;compilerOptions\u0026#34;: { \u0026#34;target\u0026#34;: \u0026#34;es2020\u0026#34;, \u0026#34;lib\u0026#34;: [ \u0026#34;es2020\u0026#34; ], \u0026#34;module\u0026#34;: \u0026#34;commonjs\u0026#34;, \u0026#34;strict\u0026#34;: true, \u0026#34;strictPropertyInitialization\u0026#34;: false, \u0026#34;esModuleInterop\u0026#34;: true, \u0026#34;skipLibCheck\u0026#34;: true, \u0026#34;forceConsistentCasingInFileNames\u0026#34;: true, \u0026#34;sourceMap\u0026#34;: true, \u0026#34;outDir\u0026#34;: \u0026#34;./dist\u0026#34;, \u0026#34;incremental\u0026#34;: true, \u0026#34;noErrorTruncation\u0026#34;: true, \u0026#34;listEmittedFiles\u0026#34;: true, }, \u0026#34;type\u0026#34;: \u0026#34;module\u0026#34;, \u0026#34;include\u0026#34;: [ \u0026#34;src/**/*.ts\u0026#34;, \u0026#34;*.ts\u0026#34;, \u0026#34;src/**/*.tsx\u0026#34;, \u0026#34;*.tsx\u0026#34; ] } Some Options strictPropertyInitialization strictPropertyInitialization is a strict type checking option in TypeScript. When set to true in your tsconfig.json file, TypeScript will ensure that each instance property of a class gets initialized in the constructor body, or by a property initializer.\nFor example, consider the following TypeScript class:\nclass MyClass { myProp: number; } If strictPropertyInitialization is set to true, TypeScript will throw an error because myProp is not initialized. To fix the error, you need to initialize myProp in the constructor or directly at declaration:\nclass MyClass { myProp: number = 0; // Initialized at declaration constructor() { this.myProp = 0; // Or initialized in the constructor } } lib The lib option in the TypeScript configuration file (tsconfig.json) is used to specify a list of library files to be included in the compilation.\n\u0026quot;lib\u0026quot;: [\u0026quot;es2020\u0026quot;] means that TypeScript code will be compiled with the library files that correspond to the ES2020 version of JavaScript. This includes all the built-in JavaScript objects and functions that are part of the ES2020 specification, such as BigInt, Promise.allSettled, global this, etc.\nThis option allows you to write TypeScript code that uses these ES2020 features, and it will be correctly type-checked and compiled to your target JavaScript version (specified by the target option in tsconfig.json). If you use a feature that is not included in the specified library, TypeScript will give a compile error.\nesModuleInterop The esModuleInterop option in the TypeScript configuration file (tsconfig.json) is used to enable a more compatible CommonJS/AMD module emit.\nIn JavaScript, there’s a difference between import foo from 'foo' and import * as foo from 'foo'. The former is known as a default import and the latter is known as a namespace import.\nHowever, when importing CommonJS modules (which is the type of module used in Node.js), this distinction does not exist. In CommonJS, you can use const foo = require('foo') for both cases.\nWhen esModuleInterop is set to true, TypeScript will allow you to use default imports syntax for CommonJS modules for a smoother interoperability between modules.\nskipLibCheck The skipLibCheck option in the TypeScript configuration file (tsconfig.json) is used to skip type checking of declaration files (*.d.ts).\nWhen this option is set to true, TypeScript will skip checking the correctness of the types declared in these files. This can significantly speed up the TypeScript compilation process, especially in larger projects with many dependencies.\nforceConsistentCasingInFileNames The forceConsistentCasingInFileNames option in the TypeScript configuration file (tsconfig.json) ensures that the casing of your import statements matches the casing of the files in your file system.\nimport { MyComponent } from \u0026#39;./myComponent\u0026#39;; If the actual file name is MyComponent.ts (note the uppercase ‘M’), and forceConsistentCasingInFileNames is set to true, TypeScript will throw an error because the casing in the import statement does not match the casing of the actual file name.\nincremental The incremental option in the TypeScript configuration file (tsconfig.json) is used to enable incremental compilation.\nWhen this option is set to true, TypeScript will save information about the previous compilation to a .tsbuildinfo file. This file is then used in subsequent compilations to speed up the TypeScript build process by only checking and emitting files that have changed (or may have been affected by changes) since the last compilation.\nnoErrorTruncation The noErrorTruncation option in the TypeScript configuration file (tsconfig.json) is used to control the truncation of error messages.\nBy default, TypeScript might truncate certain error messages if they are too long. This is done to prevent the output from being overwhelmed with information that might not be useful.\nWhen noErrorTruncation is set to true, TypeScript will not truncate error messages, and will instead output the full details of every error. This can be useful if you’re debugging a complex issue and need to see the entire error message.\nFor all the options, check this : TypeScript: TSConfig Reference – Docs on every TSConfig option (typescriptlang.org)\n","permalink":"https://learncodecamp.net/tsconfig-json-file-in-typescript-projects/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eThe Heart of Your TypeScript Project:\u003c/strong\u003e A \u003ccode\u003etsconfig.json\u003c/code\u003e file acts as the central configuration hub for your TypeScript project. It tells the TypeScript compiler (TSC) how to transform your TypeScript code into usable JavaScript.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRoot Signal:\u003c/strong\u003e The presence of a \u003ccode\u003etsconfig.json\u003c/code\u003e file signifies that the directory it lives in is the root of your TypeScript project.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"why-use-a-tsconfigjson-file\"\u003eWhy use a tsconfig.json file?\u003c/h3\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eCustomization:\u003c/strong\u003e It allows you to tailor the TypeScript compiler’s behavior to match your project’s specific needs and preferences.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eConsistency:\u003c/strong\u003e It ensures that all developers working on the project use the same compiler settings, leading to a more consistent codebase.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eEfficiency:\u003c/strong\u003e You can avoid long and repetitive command-line flags when using the \u003ccode\u003etsc\u003c/code\u003e (TypeScript compiler) command. The \u003ccode\u003etsconfig.json\u003c/code\u003e stores your settings.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch3 id=\"key-sections-within-tsconfigjson\"\u003eKey Sections within tsconfig.json\u003c/h3\u003e\n\u003cp\u003eLet’s focus on the most important section for understanding how to configure your project:\u003c/p\u003e","title":"Learning about tsconfig.json file in TypeScript Projects"},{"content":"Introduction What are Promises?\nManaging Asynchronous Operations: In JavaScript, many operations (like fetching data from a server, reading a file, or waiting for a timer) take time. Promises provide a structured way to handle the results of these asynchronous operations without getting tangled up in messy callbacks. A Proxy for Future Values: A Promise is an object that represents the eventual result of an asynchronous operation. It’s like a placeholder. Initially, the promise is in a “pending” state, but eventually, it will either: Fulfilled: The operation was successful, and a value is available. Rejected: The operation failed, and you get an error explaining why. Key Methods .then() Used to handle the successful resolution of a Promise. It takes a callback function that receives the resolved value. .catch() Used to handle errors. It takes a callback function that receives the error object. .finally() Executes a callback function regardless of whether a promise is fulfilled or rejected. Often used for cleanup tasks. import console from \u0026#39;console\u0026#39;; import { setTimeout } from \u0026#39;timers\u0026#39;; function delay(ms: number, shouldResolve: boolean): Promisestring\u0026gt; { return new Promise((resolve, reject) =\u0026gt; { setTimeout(() =\u0026gt; { if (shouldResolve) { resolve(\u0026#39;Promise resolved\u0026#39;); } else { reject(\u0026#39;Promise rejected\u0026#39;); } } , ms); }); } delay(1000, true).then(function(message) { console.log(message); // This will log \u0026#34;Promise resolved\u0026#34; }).catch(function(error) { console.error(error); // This won\u0026#39;t be called in this case }); When you create a new Promise in TypeScript, you pass an executor function to the Promise constructor. This executor function takes two arguments: a resolve function and a reject function.\nHere’s what each argument is for:\nresolve: This is a function that you call when the asynchronous operation completes successfully. You call this function with the result of the operation. reject: This is a function that you call when the asynchronous operation fails. You call this function with the reason for the failure, which is typically an Error object. (resolve, reject) =\u0026gt; { ... }\nThis is the executor function. It has two parameters: resolve: A function you call when the asynchronous operation has successfully completed. You must pass it the value that the Promise should resolve to. reject: A function you call if the asynchronous operation fails. You pass it an Error object that explains the reason for the failure. Code for JavaScript Promises class MyPromise { constructor(executor) { this.state = \u0026#39;pending\u0026#39;; this.value = undefined; this.onFulfilledCallbacks = []; this.onRejectedCallbacks = []; const resolve = (value) =\u0026gt; { if (this.state === \u0026#39;pending\u0026#39;) { if (value instanceof MyPromise) { value.then(resolve, reject); } else { this.state = \u0026#39;fulfilled\u0026#39;; this.value = value; this.onFulfilledCallbacks.forEach(callback =\u0026gt; callback(value)); } } }; const reject = (reason) =\u0026gt; { if (this.state === \u0026#39;pending\u0026#39;) { this.state = \u0026#39;rejected\u0026#39;; this.value = reason; this.onRejectedCallbacks.forEach(callback =\u0026gt; callback(reason)); } }; try { executor(resolve, reject); } catch (error) { reject(error); } } then(onFulfilled, onRejected) { return new MyPromise((resolve, reject) =\u0026gt; { const handleFulfilled = (value) =\u0026gt; { try { const result = onFulfilled(value); if (result instanceof MyPromise) { result.then(resolve, reject); } else { resolve(result); } } catch (error) { reject(error); } }; const handleRejected = (reason) =\u0026gt; { try { const result = onRejected(reason); if (result instanceof MyPromise) { result.then(resolve, reject); } else { reject(result); } } catch (error) { reject(error); } }; if (this.state === \u0026#39;fulfilled\u0026#39;) { handleFulfilled(this.value); } else if (this.state === \u0026#39;rejected\u0026#39;) { handleRejected(this.value); } else { this.onFulfilledCallbacks.push(handleFulfilled); this.onRejectedCallbacks.push(handleRejected); } }); } } Design Choices The MyPromise class has a constructor that takes an executor function. This function is immediately executed and is passed two functions: resolve and reject. These functions allow the executor to indicate the success or failure of the promise. The state property is used to track the status of the promise. It can be ‘pending’, ‘fulfilled’, or ‘rejected’. The value property is used to store the result of the promise. The onFulfilledCallbacks and onRejectedCallbacks arrays are used to store callback functions that will be executed when the promise is fulfilled or rejected. New Promise from then function: The then method returns a new promise. This is a key feature of promises that enables chaining. The returned promise resolves or rejects based on the outcome of the onFulfilled or onRejected callbacks. If these callbacks return a value, the returned promise is resolved with that value. If they throw an error, the returned promise is rejected with that error. Need for a Callback Array: The onFulfilledCallbacks and onRejectedCallbacks arrays are needed because the then method can be called multiple times on the same promise. Each time then is called, a new callback is added to the respective array. When the promise is settled (either fulfilled or rejected), all the registered callbacks are called. This is known as “observation” of a promise. These arrays are also useful when the then method is called after the promise has already settled. In this case, the callback is immediately called with the promise’s value or reason myPromise = new MyPromise((resolve, reject) =\u0026gt; { setTimeout(() =\u0026gt; { resolve(\u0026#39;success\u0026#39;); }, 1000); }); myPromise.then((value) =\u0026gt; { console.log(value); }); ","permalink":"https://learncodecamp.net/javascript-promises/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eWhat are Promises?\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eManaging Asynchronous Operations:\u003c/strong\u003e In JavaScript, many operations (like fetching data from a server, reading a file, or waiting for a timer) take time. Promises provide a structured way to handle the results of these asynchronous operations without getting tangled up in messy callbacks.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eA Proxy for Future Values:\u003c/strong\u003e A Promise is an object that represents the eventual result of an asynchronous operation. It’s like a placeholder. Initially, the promise is in a “pending” state, but eventually, it will either:\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eFulfilled:\u003c/strong\u003e The operation was successful, and a value is available.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRejected:\u003c/strong\u003e The operation failed, and you get an error explaining why.\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"key-methods\"\u003eKey Methods\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003e\u003ccode\u003e.then()\u003c/code\u003e\u003c/strong\u003e Used to handle the successful resolution of a Promise. It takes a callback function that receives the resolved value.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003e\u003ccode\u003e.catch()\u003c/code\u003e\u003c/strong\u003e Used to handle errors. It takes a callback function that receives the error object.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003e\u003ccode\u003e.finally()\u003c/code\u003e\u003c/strong\u003e Executes a callback function regardless of whether a promise is fulfilled or rejected. Often used for cleanup tasks.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#f92672\"\u003eimport\u003c/span\u003e console \u003cspan style=\"color:#f92672\"\u003efrom\u003c/span\u003e \u003cspan style=\"color:#e6db74\"\u003e\u0026#39;console\u0026#39;\u003c/span\u003e;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#f92672\"\u003eimport\u003c/span\u003e { setTimeout } \u003cspan style=\"color:#f92672\"\u003efrom\u003c/span\u003e \u003cspan style=\"color:#e6db74\"\u003e\u0026#39;timers\u0026#39;\u003c/span\u003e;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003efunction delay(ms: number, shouldResolve: boolean): Promisestring\u003cspan style=\"color:#f92672\"\u003e\u0026gt;\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  \u003cspan style=\"color:#66d9ef\"\u003ereturn\u003c/span\u003e new Promise((resolve, reject) \u003cspan style=\"color:#f92672\"\u003e=\u0026gt;\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    setTimeout(() \u003cspan style=\"color:#f92672\"\u003e=\u0026gt;\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e      \u003cspan style=\"color:#66d9ef\"\u003eif\u003c/span\u003e (shouldResolve) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        resolve(\u003cspan style=\"color:#e6db74\"\u003e\u0026#39;Promise resolved\u0026#39;\u003c/span\u003e);\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e      } \u003cspan style=\"color:#66d9ef\"\u003eelse\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        reject(\u003cspan style=\"color:#e6db74\"\u003e\u0026#39;Promise rejected\u0026#39;\u003c/span\u003e);\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e      }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    , ms);\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  });\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003edelay(\u003cspan style=\"color:#ae81ff\"\u003e1000\u003c/span\u003e, true)\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003ethen(function(message) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  console\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003elog(message); \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e This will log \u003cspan style=\"color:#e6db74\"\u003e\u0026#34;Promise resolved\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e})\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003ecatch(function(error) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  console\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eerror(error); \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e This won\u003cspan style=\"color:#e6db74\"\u003e\u0026#39;t be called in this case\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e});\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eWhen you create a new Promise in TypeScript, you pass an executor function to the Promise constructor. This executor function takes two arguments: a \u003ccode\u003eresolve\u003c/code\u003e function and a \u003ccode\u003ereject\u003c/code\u003e function.\u003c/p\u003e","title":"Understanding JavaScript Promises"},{"content":"Introduction Node.js, the ubiquitous server-side JavaScript platform, boasts a surprisingly dramatic history. This blog post delves into the history of Node.js, Ryan Dahl (Node.js creator), Isaac Schlueter (npm creator), and Myles Borins (early adopter and contributor).\nnode.js From Snowboard Websites to Async IO Ryan Dahl’s journey to Node.js commenced unexpectedly. After leaving a math PhD program, he found himself coding snowboard marketing websites. Driven by a desire to tackle more abstract problems, he delved into web stack technologies, ultimately culminating in the birth of Node.js.\nThe Birth of Node.js Inspired by Chrome and V8’s release, Dahl envisioned a server-side platform leveraging JavaScript’s potential for non-blocking IO. He crafted the first version in Cologne, Germany, unveiling it at JSConf EU in 2009. The demonstration, featuring a fully functional IRC server written in JavaScript, astounded audiences and propelled the project’s popularity.\nEarly Growth and Challenges Isaac Schlueter, weary of switching between PHP and JavaScript, stumbled upon Node.js and recognized its promise. He pioneered npm, the Node Package Manager, streamlining library sharing and collaboration. However, the project’s infancy was marred by rapid changes and breaking APIs, leading to frustration among some users.\nThe Joyent Era and Windows Support Dahl’s alliance with Joyent, a hosting provider, provided vital funding for Node.js. Nonetheless, this move also planted seeds of future discord. Simultaneously, Bert Belder undertook the Herculean task of porting Node.js to Windows, a pivotal endeavor in broadening its accessibility.\nRyan’s Departure and Isaac’s Ascendancy After several years, feeling drained and seeking fresh challenges, Dahl bid farewell to Joyent, entrusting Node.js to Isaac Schlueter. While the transition appeared smooth initially, it bred tension within the community as Joyent tightened its grip over the project.\nThe Node Forward Movement and io.js Fork Apprehensions regarding Joyent’s stewardship and sluggish release cycles prompted the emergence of Node Forward, a cohort of core contributors advocating for open governance. When negotiations with Joyent stalled, Fedor Indutny forked the project, spawning io.js, which swiftly gained traction.\nReconciliation and the Node.js Foundation The birth of io.js served as a wake-up call for Joyent. With the appointment of a new CEO, Scott Hammond, the company adopted a more receptive stance towards the community’s demands. The Linux Foundation intervened, facilitating the establishment of the Node.js Foundation, ensuring open governance and fostering a united community.\nIn 2019, a significant development reshaped the landscape of JavaScript foundations: the merger of the Node.js Foundation with the JS Foundation, resulting in the birth of the OpenJS Foundation. This consolidation marked a pivotal moment in fostering collaboration and advancing JavaScript technologies under a unified banner.\nOpenJS Foundation Fast forward to 2022, another notable event unfolded as Joyent, the erstwhile steward of Node.js, transferred ownership of the Node.js trademarks to the OpenJS Foundation. This gesture solidified the foundation’s role as the custodian of Node.js and underscored the commitment to open governance and community-driven development.\nFor a complete video, you can watch this : Node.js: The Documentary | An origin story (youtube.com)\n","permalink":"https://learncodecamp.net/history-of-node-js/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eNode.js, the ubiquitous server-side JavaScript platform, boasts a surprisingly dramatic history. This blog post delves into the history of Node.js, Ryan Dahl (Node.js creator), Isaac Schlueter (npm creator), and Myles Borins (early adopter and contributor).\u003c/p\u003e\n\u003cdiv\u003e\n  \u003cfigure\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"/wp-content/uploads/2024/03/image-9-1024x1024.png\" alt=\"\" style=\"width:193px;height:auto\" /\u003e\u003cfigcaption class=\"wp-element-caption\"\u003enode.js\u003c/figcaption\u003e\u003c/figure\u003e\n\u003c/div\u003e\n\u003ch3 id=\"from-snowboard-websites-to-async-io\"\u003eFrom Snowboard Websites to Async IO\u003c/h3\u003e\n\u003cp\u003eRyan Dahl’s journey to Node.js commenced unexpectedly. After leaving a math PhD program, he found himself coding snowboard marketing websites. Driven by a desire to tackle more abstract problems, he delved into web stack technologies, ultimately culminating in the birth of Node.js.\u003c/p\u003e","title":"The Dramatic History of Node.js: From Humble Beginnings to Open Governance"},{"content":"Introduction Cloud Firestore is a powerful, cloud-based NoSQL database that offers a flexible and scalable solution for storing and managing data. This post dives into the core concepts of Firestore, contrasting it with traditional relational databases and highlighting its unique advantages.\nNoSQL vs. Relational Databases: A Paradigm Shift If you’re familiar with relational databases like MySQL, you’re used to structured tables with predefined columns and data types. This rigid schema enforces data consistency but can be inflexible when your data model evolves.\nNoSQL databases like Firestore take a different approach. They are schema-less, allowing you to store data in a more flexible and dynamic way. Firestore uses a document-collection model, where data is organized into documents (similar to JSON objects) and collections (groups of documents). This structure offers several benefits:\nEasy iteration: You can add or modify fields in your documents without affecting existing data, making it easier to adapt your database as your app evolves. Handling diverse data: Firestore readily accommodates data with varying structures, perfect for situations where you have similar but not identical data types. While the schema-less nature offers flexibility, it requires careful coding practices to ensure data consistency. You’ll need to implement checks on the client-side to validate the data you retrieve.\nAnother key difference is the absence of SQL in NoSQL databases. Firestore doesn’t support complex joins across multiple tables. Instead, you’ll need to structure your data strategically, often duplicating some information to optimize retrieval. This might seem counterintuitive at first, but it leads to significant performance gains for read-heavy applications.\nAdvantages of Cloud Firestore Firestore shines with its ability to scale horizontally. Firestore distributes data across multiple servers seamlessly. This allows your database to grow effortlessly without impacting performance. Additionally, managed cloud environments like Google Cloud Platform can automatically adjust server resources to meet your app’s needs.\nFirestore also offers:\nShallow queries: You can retrieve specific documents within a collection without fetching the entire subcollection, optimizing data transfer and performance. Real-time updates: Firestore provides real-time data synchronization across clients, making it ideal for collaborative applications. Understanding the Document-Collection Model Firestore organizes data into a hierarchical structure of documents and collections:\nDocuments: Documents are the basic data units, consisting of key-value pairs called fields. These fields can hold various data types, including strings, numbers, and even nested objects (maps). Collections: Collections are containers for documents, similar to dictionaries where the values are always documents. There are a few key rules to remember:\nCollections can only contain documents. Documents have a size limit of 1 MB. Documents cannot directly contain other documents, but they can reference subcollections. The root of your Firestore database can only contain collections. This structure allows you to build intuitive data hierarchies. For example, a restaurant review app might have a Restaurants collection, with each document representing a restaurant and containing a reference to a Reviews subcollection. This subcollection would then hold individual review documents.\nOptimizing Data Structure for Performance While Firestore offers flexibility in data organization, it’s crucial to structure your data strategically for optimal performance. Consider the following:\nDuplicating data: In some cases, duplicating data across different documents can improve read performance by reducing the need for multiple database requests. Nesting data: Nesting data within documents can be useful, but be mindful of the document size limit and the potential impact on read performance. Conclusion Cloud Firestore provides a powerful and flexible NoSQL database solution for modern applications. Its document-collection model, horizontal scaling, and real-time updates make it a compelling choice for developers looking to build scalable and performant applications. While it requires a shift in thinking from traditional relational databases, understanding its core concepts and best practices can help you leverage its full potential.\nFor more details check the official documentation: Firestore | Firebase (google.com)\n","permalink":"https://learncodecamp.net/firestore-nosql-db/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eCloud Firestore is a powerful, cloud-based NoSQL database that offers a flexible and scalable solution for storing and managing data. This post dives into the core concepts of Firestore, contrasting it with traditional relational databases and highlighting its unique advantages.\u003c/p\u003e\n\u003ch3 id=\"nosql-vs-relational-databases-a-paradigm-shift\"\u003eNoSQL vs. Relational Databases: A Paradigm Shift\u003c/h3\u003e\n\u003cp\u003eIf you’re familiar with relational databases like MySQL, you’re used to structured tables with predefined columns and data types. This rigid schema enforces data consistency but can be inflexible when your data model evolves.\u003c/p\u003e","title":"Firestore: A NoSQL Database for Modern Applications"},{"content":" Introduction What are Node.js Middlewares? Examples Logging Middleware Authentication Middleware Error Handling Middleware Rate Limiting Middleware (using third-party library) Some third-party Node.js middleware examples helmet compression cors express-session passport express-validator Conclusion Introduction Node.js is a powerful runtime environment for executing JavaScript code server-side. One of the key features that make Node.js so versatile and popular is its middleware architecture. Middlewares play a crucial role in the request-response lifecycle of Node.js applications, enabling developers to modularize and streamline the handling of HTTP requests.\nWhat are Node.js Middlewares? Middlewares in Node.js are functions that have access to the request object (req), the response object (res), and the next function in the application’s request-response cycle. They can modify the request and response objects, terminate the request-response cycle, or pass control to the next middleware in the stack.\nThe middleware stack is a series of functions that execute sequentially for each incoming request. Each middleware function can perform specific tasks, such as logging, authentication, validation, error handling, and more, before passing control to the next middleware.\nWhy Use Node.js Middlewares? Middlewares provide a flexible and modular approach to handling HTTP requests in Node.js applications. Here are some key reasons why middlewares are essential:\nModularity: Middlewares allow developers to break down complex request handling logic into smaller, reusable components. This promotes code organization and maintainability. Cross-cutting Concerns: Common functionalities such as authentication, logging, input validation, and error handling can be implemented as middleware functions and applied across multiple routes or endpoints. Orderly Execution: Middlewares execute in a specific order defined by the developer, ensuring that each step of request processing is executed consistently for all incoming requests. Error Handling: Middlewares can intercept errors and handle them gracefully without crashing the application. They provide a centralized mechanism for error handling, making it easier to troubleshoot and debug. Writing Node.js Middlewares Writing middleware in Node.js is straightforward. A middleware function takes three arguments: req, res, and next. Here’s a basic example of a middleware function that logs the incoming request URL:\nfunction loggerMiddleware(req, res, next) { console.log(`Incoming request: ${req.method} ${req.url}`); next(); } To use this middleware in an Express.js application, you simply add it to the middleware stack using the app.use() method:\nconst express = require(\u0026#39;express\u0026#39;); const app = express(); app.use(loggerMiddleware); Now, every incoming request will trigger the loggerMiddleware, logging the request URL to the console.\nThere are several types of middlewares commonly used in Node.js applications:\nApplication-level Middlewares: These middlewares are bound to the application and are executed for every incoming request. Examples include logging, parsing request bodies, and error handling. Router-level Middlewares: Router-level middlewares are bound to specific routes or groups of routes using Express.js routers. They are useful for handling request-specific tasks such as authentication and validation. Error-handling Middlewares: These middlewares are used to handle errors that occur during request processing. They are defined with four parameters (err, req, res, next) and are placed at the end of the middleware stack. Third-party Middlewares: Third-party middlewares are modules or packages developed by the community to add specific functionalities to Express.js applications, such as authentication, session management, and rate limiting. Best Practices for Using Node.js Middlewares To ensure optimal performance and maintainability of your Node.js applications, consider the following best practices when working with middlewares:\nKeep Middlewares Simple: Write small, focused middleware functions that perform one task well. This promotes reusability and makes the code easier to understand and maintain. Use Middleware Libraries: Leverage existing middleware libraries whenever possible to handle common tasks such as authentication, input validation, and error handling. Popular libraries like passport and express-validator provide robust solutions for these functionalities. Order Matters: Pay attention to the order in which middlewares are added to the stack. Middlewares added earlier in the stack will be executed first, followed by those added later. Ensure that the order of execution aligns with the desired request processing flow. Handle Errors Gracefully: Implement error-handling middlewares to catch and handle errors that occur during request processing. Use the next function to pass errors to the next error-handling middleware or the default Express.js error handler. Avoid Blocking Operations: Middlewares should not perform blocking operations such as synchronous I/O or long-running computations, as this can degrade the performance of the application. Use asynchronous operations or offload heavy tasks to worker threads or external services. Test Middlewares Thoroughly: Write unit tests for your middleware functions to ensure they behave as expected under different scenarios. Mock the req, res, and next objects to simulate incoming requests and responses. Examples 1: Logging Middleware Logging middleware is a common middleware used to log incoming requests and responses. Here’s a simple implementation:\nfunction loggerMiddleware(req, res, next) { console.log(`[${new Date().toISOString()}] ${req.method} ${req.url}`); next(); } 2: Authentication Middleware Authentication middleware is used to authenticate incoming requests before allowing access to protected routes. Here’s a basic example using JSON Web Tokens (JWT):\nconst jwt = require(\u0026#39;jsonwebtoken\u0026#39;); function authenticateMiddleware(req, res, next) { // Get the token from the request headers or cookies const token = req.headers.authorization?.split(\u0026#39; \u0026#39;)[1] || req.cookies.token; if (!token) { return res.status(401).json({ error: \u0026#39;Unauthorized\u0026#39; }); } try { // Verify the token const decoded = jwt.verify(token, process.env.JWT_SECRET); req.user = decoded.user; next(); } catch (error) { return res.status(401).json({ error: \u0026#39;Invalid token\u0026#39; }); } } You can protect your routes by applying this middleware:\napp.get(\u0026#39;/protected\u0026#39;, authenticateMiddleware, (req, res) =\u0026gt; { res.json({ message: \u0026#39;Authenticated route\u0026#39;, user: req.user }); }); 3: Error Handling Middleware Error handling middleware is used to catch and handle errors that occur during request processing. Here’s a basic example:\nfunction errorHandlerMiddleware(err, req, res, next) { console.error(err.stack); res.status(500).json({ error: \u0026#39;Internal Server Error\u0026#39; }); } 4: Rate Limiting Middleware (using third-party library) Rate limiting middleware is used to limit the number of requests a client can make within a certain time frame. You can use the express-rate-limit middleware for this purpose:\nconst rateLimit = require(\u0026#39;express-rate-limit\u0026#39;); const limiter = rateLimit({ windowMs: 15 * 60 * 1000, // 15 minutes max: 100 // limit each IP to 100 requests per windowMs }); app.use(limiter); Some third-party Node.js middleware examples 1. \u0026lt;a href=\u0026quot;https://helmetjs.github.io/\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://helmetjs.github.io/\u0026quot;\u0026gt;helmet\u0026lt;/a\u0026gt; helmet is a middleware that helps secure Express.js apps by setting various HTTP headers. It enhances security by mitigating common web vulnerabilities.\n2. \u0026lt;a href=\u0026quot;https://expressjs.com/en/resources/middleware/compression.html\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://expressjs.com/en/resources/middleware/compression.html\u0026quot;\u0026gt;compression\u0026lt;/a\u0026gt; compression is a middleware that compresses HTTP responses to reduce the size of payloads transferred over the network, improving performance.\n3. \u0026lt;a href=\u0026quot;https://expressjs.com/en/resources/middleware/cors.html\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://expressjs.com/en/resources/middleware/cors.html\u0026quot;\u0026gt;cors\u0026lt;/a\u0026gt; cors is a middleware that enables Cross-Origin Resource Sharing (CORS) in Express.js applications, allowing controlled access to resources from other domains.\n4. \u0026lt;a href=\u0026quot;https://www.npmjs.com/package/express-session\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://www.npmjs.com/package/express-session\u0026quot;\u0026gt;express-session\u0026lt;/a\u0026gt; express-session is a middleware for managing user sessions in Express.js applications. It provides session-based authentication and session storage options.\n5. \u0026lt;a href=\u0026quot;https://www.passportjs.org/concepts/authentication/middleware/\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://www.passportjs.org/concepts/authentication/middleware/\u0026quot;\u0026gt;passport\u0026lt;/a\u0026gt; passport is an authentication middleware for Node.js applications. It supports various authentication strategies, including username/password, OAuth, and JWT.\n6. \u0026lt;a href=\u0026quot;https://express-validator.github.io/docs/\u0026quot; data-type=\u0026quot;link\u0026quot; data-id=\u0026quot;https://express-validator.github.io/docs/\u0026quot;\u0026gt;express-validator\u0026lt;/a\u0026gt; express-validator is a middleware for input validation and sanitization in Express.js applications. It provides robust validation and sanitization functions for request data.\nConclusion Middlewares are a fundamental aspect of Node.js web development, providing a modular and extensible way to handle HTTP requests. By understanding the purpose, usage, and best practices of middlewares, developers can build scalable, maintainable, and secure web applications with Node.js.\n","permalink":"https://learncodecamp.net/nodejs-middlewares/","summary":"\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#introduction\"\u003eIntroduction\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#what-are-node-js-middlewares\"\u003eWhat are Node.js Middlewares?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#examples\"\u003eExamples\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#1-logging-middleware\"\u003eLogging Middleware\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#2-authentication-middleware\"\u003eAuthentication Middleware\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#3-error-handling-middleware\"\u003eError Handling Middleware\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#4-rate-limiting-middleware-using-third-party-library\"\u003eRate Limiting Middleware (using third-party library)\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#some-third-party-node-js-middleware-examples\"\u003eSome third-party Node.js middleware examples\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#1-helmet\"\u003ehelmet\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#2-compression\"\u003ecompression\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#3-cors\"\u003ecors\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#4-express-session\"\u003eexpress-session\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#5-passport\"\u003epassport\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#6-express-validator\"\u003eexpress-validator\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/nodejs-middlewares/#conclusion\"\u003eConclusion\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eNode.js is a powerful runtime environment for executing JavaScript code server-side. One of the key features that make Node.js so versatile and popular is its middleware architecture. Middlewares play a crucial role in the request-response lifecycle of Node.js applications, enabling developers to modularize and streamline the handling of HTTP requests.\u003c/p\u003e","title":"Understanding Node.js Middlewares: A Comprehensive Guide"},{"content":"Introduction In today’s interconnected digital landscape, APIs (Application Programming Interfaces) serve as the backbone of modern software development, enabling seamless communication between disparate systems. However, managing and securing APIs can be complex, especially in microservices architectures where numerous services interact with each other.\nWhat is an API Gateway? An API gateway is an architectural pattern that sits between clients and backend services, acting as a single entry point for all incoming API requests. It serves as a reverse proxy, routing requests to the appropriate services based on predefined rules and configurations. API gateways offer a centralized point of control for managing various aspects of API communication, including routing, authentication, authorization, rate limiting, logging, and monitoring.\nAPI Gateway\nKey Features and Functionalities of API Gateway Routing and Load Balancing: API gateways efficiently route incoming requests to the appropriate backend services based on predefined routing rules. Additionally, they can perform load balancing to distribute traffic evenly across multiple instances of a service, ensuring optimal performance and high availability. Authentication and Authorization: Security is paramount in API-driven architectures. API gateways provide robust authentication and authorization mechanisms to secure API endpoints against unauthorized access. This includes support for various authentication methods such as JWT (JSON Web Tokens), OAuth, API keys, and custom authentication schemes. Rate Limiting and Throttling: To prevent abuse and ensure fair usage of resources, API gateways offer rate limiting and throttling capabilities. These features enable administrators to set limits on the number of requests allowed per user, IP address, or API key within a specific timeframe, thereby mitigating the risk of overloading backend services. Transformation and Aggregation: API gateways can transform request and response payloads to meet the specific requirements of clients and backend services. This includes data format conversion (e.g., JSON to XML), payload enrichment, request/response validation, and aggregation of multiple backend responses into a single cohesive response. Logging and Monitoring: Monitoring the health and performance of APIs is essential for proactive maintenance and troubleshooting. API gateways facilitate real-time logging and monitoring of API traffic, providing valuable insights into request/response metrics, error rates, latency, and traffic patterns. Example of an open source API gateway: Overview of the WSO2 API Gateway – WSO2 API Manager Documentation 4.2.0\nBest Practices and Challenges To maximize the benefits of API gateways and mitigate potential challenges, consider the following best practices:\nDesign for Scalability: As API traffic grows, scalability becomes a critical consideration. Design API gateways with horizontal scalability in mind, leveraging load balancing and auto-scaling capabilities to handle increasing load effectively. Implement Robust Security Measures: Security should be a top priority when designing API gateways. Implement strong authentication and authorization mechanisms, encrypt sensitive data in transit, and regularly audit and update security configurations to mitigate evolving threats. Monitor Performance and Health: Continuous monitoring of API gateway performance and health is essential for identifying bottlenecks, detecting anomalies, and ensuring optimal reliability. Utilize monitoring tools and dashboards to track key metrics and proactively address issues. API gateways play a pivotal role in modern software architectures, providing a centralized solution for managing, securing, and monitoring API traffic. By leveraging the powerful features and functionalities offered by API gateways, organizations can streamline their API management processes, enhance security, and facilitate scalability.\n","permalink":"https://learncodecamp.net/api-gateway/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn today’s interconnected digital landscape, APIs (Application Programming Interfaces) serve as the backbone of modern software development, enabling seamless communication between disparate systems. However, managing and securing APIs can be complex, especially in microservices architectures where numerous services interact with each other.\u003c/p\u003e\n\u003ch3 id=\"what-is-an-api-gateway\"\u003eWhat is an API Gateway?\u003c/h3\u003e\n\u003cp\u003eAn API gateway is an architectural pattern that sits between clients and backend services, acting as a single entry point for all incoming API requests. It serves as a reverse proxy, routing requests to the appropriate services based on predefined rules and configurations. API gateways offer a centralized point of control for managing various aspects of API communication, including routing, authentication, authorization, rate limiting, logging, and monitoring.\u003cfigure\u003e\u003c/p\u003e","title":"Demystifying API Gateways: A Comprehensive Guide"},{"content":"Introduction In the realm of JavaScript development, package managers are indispensable tools. They streamline the process of incorporating external code libraries (packages) into your projects, making your life as a developer much easier. Two of the most prominent players in this arena are npm and pnpm.\nnpm: The Veteran npm (Node Package Manager) is the default package manager that ships with Node.js. It has been an integral part of the JavaScript ecosystem for many years, boasting a massive repository of packages. npm’s widespread adoption makes it a familiar and reliable choice for many developers.\npnpm: The Efficient Contender pnpm (Performant Node Package Manager) is a relatively newer package manager that has been gaining traction due to its emphasis on speed, disk space efficiency, and security. Let’s break down the core differences between the two:\n1. Disk Space Management npm: npm employs a relatively flat dependency tree model. This means that if multiple projects across your system rely on the same package, you’ll end up with multiple copies of that package scattered across different node_modules directories. This duplication can lead to considerable disk space consumption. pnpm: pnpm takes a dramatically different approach. It utilizes a global content-addressable store to keep a single copy of each package version on your disk. Within your project’s node_modules directory, pnpm makes extensive use of hard links and symbolic links to reference packages in this global store. This strategy results in significant disk space savings. 2. Performance pnpm generally outperforms npm in terms of installation speed, particularly in larger projects. Its intelligent linking system reduces the amount of file copying required during package installation, leading to faster execution times.\n3. Security npm: npm’s flattened node_modules structure can potentially create security vulnerabilities. A project might gain access to packages it didn’t explicitly declare as dependencies, increasing the risk of unexpected behavior or malicious code execution. pnpm: pnpm’s stricter dependency resolution and its unique file system layout help reduce the attack surface of your projects. It makes it harder for unintended packages to be accessed or executed, improving overall project security. So, Which One to Choose? The best package manager for your project depends on your specific priorities and requirements:\nnpm: If you’re working on smaller projects, value familiarity and widespread compatibility, npm is a solid and dependable choice. pnpm: If disk space conservation, speed, and enhanced security are your primary concerns, pnpm offers clear advantages, especially in larger projects or monorepo setups. Migration Considerations Migrating from npm to pnpm is remarkably straightforward in most cases. Since pnpm aims for high compatibility with npm, you can often simply replace npm commands with pnpm equivalents in your workflows and scripts.\nInstallation Methods Using npm\nnpm install -g pnpm Check official site for more details : Fast, disk space efficient package manager | pnpm\n","permalink":"https://learncodecamp.net/npm-vs-pnpm/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn the realm of JavaScript development, package managers are indispensable tools. They streamline the process of incorporating external code libraries (packages) into your projects, making your life as a developer much easier. Two of the most prominent players in this arena are npm and pnpm.\u003c/p\u003e\n\u003ch3 id=\"npm-the-veteran\"\u003enpm: The Veteran\u003c/h3\u003e\n\u003cp\u003enpm (Node Package Manager) is the default package manager that ships with Node.js. It has been an integral part of the JavaScript ecosystem for many years, boasting a massive repository of packages. npm’s widespread adoption makes it a familiar and reliable choice for many developers.\u003c/p\u003e","title":"npm vs. pnpm: A Deep Dive into JavaScript Package Managers"},{"content":"Introduction Large Language Models (LLMs) are the powerhouses behind cutting-edge AI applications like chatbots and text generation tools. These complex models have traditionally relied on high-performance GPUs to handle the massive amounts of computation involved. But what if that wasn’t necessary? Recent breakthroughs, like the BitNet B1.58 model, hint at a future where LLMs can thrive without the need for expensive, power-hungry GPUs.\nThe Problem with Floating-Point Precision Most LLMs today rely on floating-point numbers (e.g., 32-bit or 16-bit) to represent the complex data they process. While powerful, these representations require significant computational resources, which is where those powerful GPUs come in. But what if we could change the rules of the game?\nIntroducing BitNet B1.58: Embracing Ternary Efficiency BitNet B1.58 is a groundbreaking LLM that challenges the status quo. Instead of using traditional floating-point numbers, it represents its parameters using ternary values (-1, 0, +1). This simple shift has profound implications:\nReduced Computational Complexity: By switching to ternary values, multiplication operations within the model can be replaced with simpler addition operations. This significantly reduces the computational burden. Hardware Innovation: The elimination of complex multiplication opens the door to designing new types of hardware specifically optimized for this kind of computation. Efficiency and Accessibility: BitNet B1.58 promises more efficient, accessible models that consume less power and can potentially run on a wider range of devices, including edge devices and smartphones. But Does it Work? The big question is whether this efficiency comes at the cost of accuracy. Surprisingly, BitNet B1.58 has been shown to match the performance of traditional floating-point LLMs of similar size. This means it can be just as good at language understanding and generation tasks, all while being significantly more resource-efficient.\nAs you can see, BitNet b1.58 performed almost similar to the original model.\nReal-World Impact The potential impact of BitNet B1.58 and other similar advancements is huge:\nDemocratizing AI: More accessible LLMs could allow developers with limited resources to build powerful AI applications. AI on the Edge: Efficient LLMs could enable AI capabilities on devices with limited compute power, such as smartphones and IoT devices. Greener AI: Reducing energy consumption makes AI development more sustainable. This could revolutionize the field, making AI more accessible, powerful, and environmentally friendly.\nFor more details check the paper : 2402.17764.pdf (arxiv.org)\n","permalink":"https://learncodecamp.net/bit-net-1-58/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eLarge Language Models (LLMs) are the powerhouses behind cutting-edge AI applications like chatbots and text generation tools. These complex models have traditionally relied on high-performance GPUs to handle the massive amounts of computation involved. But what if that wasn’t necessary? Recent breakthroughs, like the BitNet B1.58 model, hint at a future where LLMs can thrive without the need for expensive, power-hungry GPUs.\u003c/p\u003e\n\u003ch3 id=\"the-problem-with-floating-point-precision\"\u003e\u003cstrong\u003eThe Problem with Floating-Point Precision\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eMost LLMs today rely on floating-point numbers (e.g., 32-bit or 16-bit) to represent the complex data they process. While powerful, these representations require significant computational resources, which is where those powerful GPUs come in. But what if we could change the rules of the game?\u003cfigure\u003e\u003c/p\u003e","title":"Revolutionizing AI: LLMs Without GPUs? The Promise of BitNet B1.58"},{"content":"What are vector databases? A Vector Database is a type of database that stores information in a structured way using vectors. Now, what are vectors? Think of them as mathematical representations of data that capture its meaning and context.\nLet’s say you have a photo of a cat. Instead of just storing the image file, a Vector Database will convert this photo into a vector, which is essentially a set of numbers that represent various features of the cat, like its color, shape, and size. This vector will contain information about the cat in a way that a computer can understand.\nNow, the cool thing about vector databases is that they allow you to search for similar items easily. For instance, if you upload a photo of a dog, the database can quickly find other photos with similar features to that dog by comparing their vectors. This is like when you search for similar images on your smartphone—except it’s done using mathematical calculations rather than just looking for file names or tags.\nSo, in simple terms, a Vector Database helps organize and search for different types of data by converting them into mathematical representations called vectors, making it easier to find similar items or information.\nCheck Understanding Embeddings: https://learncodecamp.net/embeddings/\nVector databases are specialized databases tailored for high-dimensional data points represented as vectors, offering efficient storage and retrieval capabilities. They excel at performing nearest-neighbor searches, swiftly retrieving data points closest to a given point in multi-dimensional space.\nKey techniques employed in vector databases k-Nearest Neighbor (k-NN) Index: This technique enables rapid identification of the k nearest neighbors of a given vector. It aids in efficiently narrowing down the search space, improving retrieval speed. Hierarchical Navigable Small World (HNSW): HNSW algorithm efficiently organizes data points to facilitate faster nearest-neighbor searches. It constructs a hierarchical graph structure that optimizes the search process. In summary, vector databases leverage advanced methods such as k-NN indexes and algorithms like HNSW to ensure efficient storage and retrieval of high-dimensional vectors, enabling rapid lookup of nearest neighbors in multi-dimensional spaces.\n# Function to find K Nearest Neighbors def find_k_nearest_neighbors(query_vector, vectors, k=5, threshold=0.8): nearest_neighbors = [] for vector in vectors: similarity = cosine_similarity(query_vector, vector) if similarity \u0026gt;= threshold: nearest_neighbors.append((vector, similarity)) nearest_neighbors.sort(key=lambda x: x[1], reverse=True) return nearest_neighbors[:k] Simple pseudocode for finding k nearest neighbors, query_vector is the embedding vector of the search term, and k is the number of the items to retrieve, threshold is minimum match %\nThe brute force of a kNN search is computationally very expensive – and depending on the size of your database, a single query could take anything from several seconds to even mins.\nNumber of computations = Number of dimensions × Vector size\nGiven:\nNumber of dimensions (d) = 1536 Vector size (n) = 1 million = 1×10^6 Number of computations=1536×1×10^6 = 1.536×10^9 approximately 1.536 billion\nit would take approximately 1.536 seconds to perform all the computations needed for your kNN search on a system capable of performing 1 billion computations per second.\nLet’s explore Approximate Nearest Neighbors Algorithms.\nApproximate Nearest Neighbors Search (K-ANNS) This is used to efficiently find approximate nearest neighbors for a given query point in a large dataset. It’s particularly useful when dealing with high-dimensional data where traditional exact nearest neighbor search methods become computationally expensive.\nThe quality of an inexact search (the recall) is defined as the ratio between the number of found true nearest neighbors and K\nTo elaborate:\nTrue Nearest Neighbors: These are the actual nearest neighbors of the query point in the dataset, determined by some distance metric. For example, if we’re searching for the 5 nearest neighbors (K=5) of a given point, the true nearest neighbors are those 5 points in the dataset that are closest to the query point. Found Nearest Neighbors: These are the points that the approximate search algorithm returns as the nearest neighbors of the query point. Due to the approximate nature of the search, they may not be exactly the same as the true nearest neighbors. Recall: The recall of the search is then defined as the ratio of the number of found true nearest neighbors to the total number of nearest neighbors desired (K). It’s calculated using the formula: Recall = (Number of Found True Nearest Neighbors) / K\nFor example, if a search algorithm returns 3 out of the 5 true nearest neighbors for a query with K=5, the recall would be 3/5, or 0.6. This means that the algorithm successfully retrieved 60% of the true nearest neighbors.\nHigh recall is desirable in many applications because it indicates that the algorithm is effectively capturing the most relevant points in the dataset.\nExamples of ANN algorithms​ Examples of ANN methods are:\ntrees – e.g. ANNOY (Figure 1), proximity graphs – e.g. HNSW (Figure 2), clustering – e.g. FAISS, hashing – e.g. LSH, vector compression – e.g. PQ or SCANN. Figure 1\nAnnoy is used at Spotify for music recommendations. Redis Search supports FLAT – Brute-force index and HNSW\nHNSW Probabilistic Skip List: A skip list is a data structure that allows for fast search, insertion, and deletion operations in a sorted list. It achieves this by adding multiple layers of pointers, allowing for “skipping” over some elements during traversal. In HNSW, the probabilistic skip list is used to organize the data points within each layer of the hierarchical structure.\nProbabilistic skip list\nThe “Navigable Small World” (NSW) part of Hierarchical Navigable Small World (HNSW) refers to a graph structure designed to maintain both local connectivity and global exploration capabilities. Let’s break down the NSW component:\nNavigability: NSW aims to create a graph structure where each data point (or node) is connected to its neighbors in a way that facilitates efficient navigation through the dataset. This means that similar points are likely to be connected, allowing for quick traversal between them. Small World Property: The small-world property refers to the idea that even though the graph may be large and sparsely connected, it’s still possible to navigate from one point to another through a relatively small number of connections. This property is essential for efficient search and exploration in large datasets. Connection Strategy: In NSW, connections between points are established based on their proximity in the data space. Typically, points that are closer together in the data space are more likely to be connected. However, NSW also incorporates randomness into the connection strategy to balance local connectivity with global exploration. Efficient Search: By creating a graph structure with the small-world property, NSW enables efficient search for nearest neighbors. During the search process, the algorithm can navigate through the graph using a combination of local connections to quickly find nearby points and occasional long-range connections to explore distant regions of the dataset. Summary Vector databases rely on Machine Learning models to generate vector embeddings for all data objects. Vector embeddings represent the meaning and context of data, enabling efficient analysis and retrieval. Vector databases provide rapid query capabilities due to Approximate Nearest Neighbors (ANN) algorithms. ANN algorithms sacrifice some accuracy in exchange for significant performance improvements. ","permalink":"https://learncodecamp.net/vector-databases-knn-hnsw/","summary":"\u003ch3 id=\"what-are-vector-databases\"\u003eWhat are vector databases?\u003c/h3\u003e\n\u003cp\u003eA Vector Database is a type of database that stores information in a structured way using vectors. Now, what are vectors? Think of them as mathematical representations of data that capture its meaning and context.\u003c/p\u003e\n\u003cp\u003eLet’s say you have a photo of a cat. Instead of just storing the image file, a Vector Database will convert this photo into a vector, which is essentially a set of numbers that represent various features of the cat, like its color, shape, and size. This vector will contain information about the cat in a way that a computer can understand.\u003c/p\u003e","title":"Exploring the Power of Vector Databases: Leveraging KNN and HNSW for Efficient Data Retrieval"},{"content":"Introduction In today’s web development landscape, building RESTful APIs has become a crucial skill for developers. Whether you’re creating a simple application or a complex system, REST APIs provide a standardized way for different software components to communicate with each other over the web. In this tutorial, we’ll walk through the process of building a RESTful API using Node.js, Express, and MongoDB, focusing on CRUD operations (Create, Read, Update, Delete) for managing products.\nFor more information on REST API, check this blog : Designing APIs Using REST Specifications: A Comprehensive Guide – Learn Code Camp\nTechnologies Used Node.js: A JavaScript runtime built on Chrome’s V8 JavaScript engine. Express: A minimalist web framework for Node.js, which simplifies the process of building web applications and APIs. MongoDB: A NoSQL database that stores data in flexible, JSON-like documents. Setting Up the Project First, let’s set up our project structure and install the necessary dependencies. Create a new directory for your project and navigate into it.\nmkdir node-express-mongodb-api cd node-express-mongodb-api Initialize a new Node.js project and install Express, Mongoose (for MongoDB integration), and any other dependencies needed.\nnpm install express mongodb mongoose npm install --save-dev nodemon Creating the MongoDB Database To install the mongo db atlas locally run these commands\nbrew install mongodb-atlas atlas deployments setup To see the collections in mongo db from mongosh, run these commands\n1. show collections; // Display all collections 2. show tables // Display all collections 3. db.getCollectionNames(); // Return array of collection. Example :[ \u0026#34;orders\u0026#34;, \u0026#34;system.profile\u0026#34; ] Some other useful commands\nshow dbs; // to show all the db present db.products.find() // to list 20 documents from collection products. Setting Up the Server Now, let’s create the main file index.js where we’ll set up our Express server and define the routes for our API.\n// index.js const express = require(\u0026#34;express\u0026#34;); const mongoose = require(\u0026#34;mongoose\u0026#34;); const productRoute = require(\u0026#34;./routes/product.route.js\u0026#34;); const app = express(); // Middleware app.use(express.json()); app.use(express.urlencoded({ extended: false })); // Routes app.use(\u0026#34;/api/products\u0026#34;, productRoute); // Home route app.get(\u0026#34;/\u0026#34;, (req, res) =\u0026gt; { res.send(\u0026#34;Hello from Node API Server\u0026#34;); }); // Connect to MongoDB and start the server mongoose .connect(\u0026#34;mongodb://localhost:63233/test?directConnection=true\u0026amp;serverSelectionTimeoutMS=2000\u0026amp;appName=mongosh+2.1.5\u0026#34;, { useNewUrlParser: true, useUnifiedTopology: true, }) .then(() =\u0026gt; { console.log(\u0026#34;Connected to MongoDB\u0026#34;); app.listen(3000, () =\u0026gt; { console.log(\u0026#34;Server is running on port 3000\u0026#34;); }); }) .catch((error) =\u0026gt; { console.error(\u0026#34;Connection to MongoDB failed:\u0026#34;, error); }); In the above code:\nWe import Express and Mongoose, and define our Express app. Middleware functions are added to parse incoming requests with JSON payloads and URL-encoded bodies. Routes are defined using the /api/products prefix, which delegates further handling to productRoute. A simple home route is set up to verify that the server is running. We connect to the MongoDB database using Mongoose and start the Express server on port 3000. Defining the Product Model Next, let’s define the product model that represents the structure of our data in MongoDB.\n// product.model.js const mongoose = require(\u0026#34;mongoose\u0026#34;); const ProductSchema = mongoose.Schema( { name: { type: String, required: [true, \u0026#34;Please enter product name\u0026#34;], }, quantity: { type: Number, required: true, default: 0, }, price: { type: Number, required: true, default: 0, }, image: { type: String, required: false, }, }, { timestamps: true, } ); const Product = mongoose.model(\u0026#34;Product\u0026#34;, ProductSchema); module.exports = Product; In the ProductSchema:\nWe define the fields for our product model (name, quantity, price, image). Field validation rules are specified using Mongoose schema types and options. We enable timestamps to automatically add createdAt and updatedAt fields to each document. Creating CRUD Operations Now, let’s create the controller functions for handling CRUD operations on products.\n// product.controller.js const Product = require(\u0026#34;../models/product.model\u0026#34;); const getProducts = async (req, res) =\u0026gt; { try { const products = await Product.find({}); res.status(200).json(products); } catch (error) { res.status(500).json({ message: error.message }); } }; const getProduct = async (req, res) =\u0026gt; { try { const { id } = req.params; const product = await Product.findById(id); if (!product) { return res.status(404).json({ message: \u0026#34;Product not found\u0026#34; }); } res.status(200).json(product); } catch (error) { res.status(500).json({ message: error.message }); } }; const createProduct = async (req, res) =\u0026gt; { try { const product = await Product.create(req.body); res.status(201).json(product); } catch (error) { res.status(500).json({ message: error.message }); } }; const updateProduct = async (req, res) =\u0026gt; { try { const { id } = req.params; const product = await Product.findByIdAndUpdate(id, req.body, { new: true, }); if (!product) { return res.status(404).json({ message: \u0026#34;Product not found\u0026#34; }); } res.status(200).json(product); } catch (error) { res.status(500).json({ message: error.message }); } }; const deleteProduct = async (req, res) =\u0026gt; { try { const { id } = req.params; const product = await Product.findByIdAndDelete(id); if (!product) { return res.status(404).json({ message: \u0026#34;Product not found\u0026#34; }); } res.status(200).json({ message: \u0026#34;Product deleted successfully\u0026#34; }); } catch (error) { res.status(500).json({ message: error.message }); } }; module.exports = { getProducts, getProduct, createProduct, updateProduct, deleteProduct, }; In the product.controller.js:\nController functions are defined for handling various CRUD operations on products. These functions use asynchronous syntax with async/await for working with MongoDB queries. Error handling is implemented to catch any potential errors and return appropriate HTTP status codes and error messages. Setting Up Product Routes Finally, let’s define the routes for our products API in product.route.js.\n// product.route.js const express = require(\u0026#34;express\u0026#34;); const router = express.Router(); const { getProducts, getProduct, createProduct, updateProduct, deleteProduct, } = require(\u0026#34;../controllers/product.controller\u0026#34;); router.get(\u0026#34;/\u0026#34;, getProducts); router.get(\u0026#34;/:id\u0026#34;, getProduct); router.post(\u0026#34;/\u0026#34;, createProduct); router.put(\u0026#34;/:id\u0026#34;, updateProduct); router.delete(\u0026#34;/:id\u0026#34;, deleteProduct); module.exports = router; Here, we define routes for fetching all products, fetching a single product by ID, creating a new product, updating an existing product, and deleting a product.\nConclusion In this tutorial, we’ve learned how to build a RESTful API with Node.js, Express, and MongoDB. We’ve covered setting up the server, defining the database model, implementing CRUD operations, and setting up routes to handle API requests. With this foundation, you can extend the API further by adding more features, implementing authentication and authorization, and optimizing performance for production use. Happy coding!\nReferences: haris-bit/simple-crud-app-backend (github.com)\n","permalink":"https://learncodecamp.net/rest-api-with-nodejs-express-mongodb/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn today’s web development landscape, building RESTful APIs has become a crucial skill for developers. Whether you’re creating a simple application or a complex system, REST APIs provide a standardized way for different software components to communicate with each other over the web. In this tutorial, we’ll walk through the process of building a RESTful API using Node.js, Express, and MongoDB, focusing on CRUD operations (Create, Read, Update, Delete) for managing products.\u003c/p\u003e","title":"Building a RESTful API with Node.js, Express, and MongoDB"},{"content":"Asynchronous programming is a crucial aspect of modern web development, allowing applications to handle multiple tasks concurrently without blocking the main execution thread. Traditionally, asynchronous JavaScript operations were managed using callbacks and promises. While effective, these approaches often led to callback hell and complex, nested code structures.\nIn this blog post, we’ll delve into how async/await works under the hood and explore its implementation in JavaScript.\nUnderstanding Asynchronous JavaScript Before diving into async/await, let’s recap the basics of asynchronous JavaScript. Asynchronous operations are tasks that don’t necessarily complete immediately or in a predictable order. Examples include fetching data from an API, reading files, or executing time-consuming computations. To handle such operations, JavaScript provides mechanisms like callbacks and promises.\nCallbacks Callbacks are functions passed as arguments to other functions. They are executed once the asynchronous operation completes. While effective, using callbacks for multiple asynchronous operations can lead to deeply nested code structures, commonly referred to as “callback hell.”\nPromises Promises were introduced to mitigate the issues associated with callbacks. A promise represents the eventual completion or failure of an asynchronous operation and allows chaining multiple asynchronous operations together. Promises provide a cleaner syntax compared to callbacks but can still result in verbose code, especially when dealing with multiple asynchronous calls.\nIntroducing Async/Await async/await is a syntactic sugar built on top of promises, offering a more concise and readable way to write asynchronous JavaScript code. The async keyword is used to define a function as asynchronous, while the await keyword is used to pause the execution of the function until a promise is settled. Let’s break down how async/await works:\n1. async Function Declaration When a function is declared with the async keyword, it automatically returns a promise. This allows the function to execute asynchronously and enables the use of the await keyword within the function body.\nasync function fetchData() { // Asynchronous operations } 2. await Keyword The await keyword is used to pause the execution of an async function until a promise is settled (either resolved or rejected). While waiting for the promise to settle, the event loop continues to execute other tasks, ensuring non-blocking behavior.\nasync function fetchData() { const data = await fetchDataFromAPI(); // Process data after fetching return data; } 3. Error Handling async/await simplifies error handling by allowing the use of try/catch blocks. If a promise is rejected while using await, the error can be caught and handled within the same function.\nasync function fetchData() { try { const data = await fetchDataFromAPI(); // Process data after fetching return data; } catch (error) { console.error(\u0026#39;Error fetching data:\u0026#39;, error); throw error; } } Advantages of Async/Await Readability: async/await code is more readable and easier to understand compared to nested callbacks or promise chains. Error Handling: Error handling is simplified with try/catch blocks, making it easier to manage errors within asynchronous code. Sequential Execution: await allows for sequential execution of asynchronous tasks, improving code organization and maintainability. Challenges and Considerations Compatibility: async/await is supported in modern browsers and Node.js versions. Care must be taken when targeting older environments or when transpiling code. Promise-Based: Despite the syntactic sugar, async/await is still based on promises and inherits their limitations, such as the lack of built-in cancellation support. Conclusion Async/Await is a powerful addition to JavaScript that simplifies asynchronous programming by providing a more intuitive syntax and better error handling. Understanding how async/await works under the hood can help developers write more efficient and maintainable code. By leveraging the advantages of async/await and addressing its challenges, developers can create robust and scalable applications in JavaScript.\n","permalink":"https://learncodecamp.net/async-await-javascript/","summary":"\u003cp\u003eAsynchronous programming is a crucial aspect of modern web development, allowing applications to handle multiple tasks concurrently without blocking the main execution thread. Traditionally, asynchronous JavaScript operations were managed using callbacks and promises. While effective, these approaches often led to callback hell and complex, nested code structures.\u003c/p\u003e\n\u003cp\u003eIn this blog post, we’ll delve into how \u003ccode\u003easync/await\u003c/code\u003e works under the hood and explore its implementation in JavaScript.\u003c/p\u003e\n\u003ch2 id=\"understanding-asynchronous-javascript\"\u003eUnderstanding Asynchronous JavaScript\u003c/h2\u003e\n\u003cp\u003eBefore diving into \u003ccode\u003easync/await\u003c/code\u003e, let’s recap the basics of asynchronous JavaScript. Asynchronous operations are tasks that don’t necessarily complete immediately or in a predictable order. Examples include fetching data from an API, reading files, or executing time-consuming computations. To handle such operations, JavaScript provides mechanisms like callbacks and promises.\u003c/p\u003e","title":"Understanding Async/Await in JavaScript: How It Works and Implementation"},{"content":"Introduction When working on Node.js projects, managing dependencies effectively is crucial for maintaining project stability, security, and scalability. The package.json and package-lock.json files play vital roles in this process, along with understanding Semantic Versioning (SemVer) and utilizing npm outdated for dependency management. In this blog post, we’ll delve into each of these components, their significance, best practices, and how to handle them in your Git repository.\n1. Understanding package.json: The package.json file is the heart of a Node.js project. It contains metadata about the project, such as its name, version, description, entry point, scripts, and most importantly, its dependencies. Developers define project dependencies and their versions in the dependencies and devDependencies sections.\nBest Practices: Always include a package.json file in your Node.js project, even if it’s minimal. Keep the dependencies section for runtime dependencies and devDependencies for development dependencies. Specify the exact version or a version range using SemVer for each dependency to ensure reproducibility and predictability. 2. package-lock.json: Introduced in npm 5, the package-lock.json file serves as a manifest for the exact versions of installed dependencies. It locks the dependency tree, ensuring that subsequent installations use the same versions. This guarantees consistency across different environments and prevents the “dependency hell” problem.\nBest Practices: Commit the package-lock.json file: This ensures that every developer and CI/CD environment installs the exact same dependency versions. Don’t manually modify package-lock.json: Let npm manage this file to avoid inconsistencies. 3. Semantic Versioning (SemVer): SemVer is a versioning scheme that helps developers communicate the nature of changes in a software package. Versions are in the format MAJOR.MINOR.PATCH, where:\nMAJOR version increases for incompatible API changes. MINOR version increases for backward-compatible functionality additions. PATCH version increases for backward-compatible bug fixes. Best Practices: Follow SemVer principles when publishing packages to ensure consumers understand the impact of version updates. Use SemVer ranges (^, ~, \u0026gt;=, etc.) in package.json to specify acceptable version ranges for dependencies. Pin down dependencies to exact versions in production to minimize unexpected changes. For more details on semVer check the official documentation : Semantic Versioning 2.0.0 | Semantic Versioning (semver.org)\n4. npm outdated: npm outdated is a command-line tool that checks for outdated dependencies in a project. It provides information about available updates, including the current and latest versions.\nBest Practices: Regularly run npm outdated to stay informed about available updates and potential security vulnerabilities. Balance the frequency of updates with stability and risk assessment. Not all updates are necessary or safe to apply immediately. Review changelogs and release notes before updating dependencies to understand the changes and potential impacts. 5. Git and Which Files to Commit: In a Git repository, deciding which files to commit is crucial for collaboration and reproducibility. For Node.js projects, the following files should typically be committed:\npackage.json: Contains project metadata and dependencies declaration. package-lock.json: Locks dependency versions for consistency across environments. .gitignore: Exclude files and directories from version control, such as node_modules. Other configuration files specific to your project (e.g., .eslintrc, .prettierrc). Best Practices: Commit package.json and package-lock.json together: Ensure consistency by always committing both files simultaneously. Include .gitignore: Prevent committing unnecessary files like node_modules and build artifacts. Avoid committing dependencies: Let npm manage dependencies and rely on package.json and package-lock.json for reproducibility. Conclusion: Effective dependency management is critical for the success of Node.js projects. By understanding the roles of package.json, package-lock.json, SemVer, and utilizing tools like npm outdated, developers can maintain project stability, security, and scalability. Following best practices ensures consistency, reproducibility, and smooth collaboration within development teams.\nRemember, consistency, communication, and automation are key pillars of successful dependency management in Node.js projects.\n","permalink":"https://learncodecamp.net/package-lock-json/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eWhen working on Node.js projects, managing dependencies effectively is crucial for maintaining project stability, security, and scalability. The \u003ccode\u003epackage.json\u003c/code\u003e and \u003ccode\u003epackage-lock.json\u003c/code\u003e files play vital roles in this process, along with understanding Semantic Versioning (SemVer) and utilizing \u003ccode\u003enpm outdated\u003c/code\u003e for dependency management. In this blog post, we’ll delve into each of these components, their significance, best practices, and how to handle them in your Git repository.\u003c/p\u003e\n\u003ch3 id=\"1-understanding-packagejson\"\u003e1. Understanding package.json:\u003c/h3\u003e\n\u003cp\u003eThe \u003ccode\u003epackage.json\u003c/code\u003e file is the heart of a Node.js project. It contains metadata about the project, such as its name, version, description, entry point, scripts, and most importantly, its dependencies. Developers define project dependencies and their versions in the \u003ccode\u003edependencies\u003c/code\u003e and \u003ccode\u003edevDependencies\u003c/code\u003e sections.\u003c/p\u003e","title":"Demystifying package.json, package-lock.json, SemVer, and npm outdated: Best Practices for Node.js Projects"},{"content":"Introduction In today’s interconnected world of software development, designing robust and scalable APIs is crucial for building successful applications. Representational State Transfer (REST) has emerged as a dominant architectural style for designing networked applications. In this blog post, we’ll delve into the principles and best practices of designing APIs using REST specifications.\nUnderstanding REST At its core, REST is an architectural style that defines a set of constraints for creating web services. These constraints, outlined by Roy Fielding in his doctoral dissertation, emphasize scalability, simplicity, and reliability. The key principles of REST include:\nResource-Based: In REST, everything is a resource, which can be accessed and manipulated using standard HTTP methods. Uniform Interface: RESTful APIs should have a uniform interface, making them easy to understand and use. This includes the use of standard HTTP methods (GET, POST, PUT, DELETE) and resource identifiers (URIs). Statelessness: Each request from a client to the server must contain all the information necessary to understand and process the request. The server should not store any client state between requests. Client-Server Architecture: REST architectures are based on the separation of concerns between the client and the server. This allows for independent evolution and scalability of both components. Cacheability: Responses from the server should be explicitly labeled as cacheable or non-cacheable to improve performance and scalability. Layered System: REST allows for the use of intermediaries (such as proxies and caches) to improve scalability and security. Designing RESTful APIs When designing RESTful APIs, it’s essential to adhere to these principles while also considering the specific requirements of your application. Here are some best practices to follow:\n1. Define Resources and URIs: Identify the resources that your API will expose. Each resource should have a unique URI (Uniform Resource Identifier) that represents its identity. For example:\nGET /products POST /products GET /products/{id} PUT /products/{id} DELETE /products/{id} 2. Use HTTP Methods Appropriately: HTTP methods should be used according to their semantics. For example:\nGET: Retrieve a resource or a collection of resources. POST: Create a new resource. PUT: Update an existing resource. DELETE: Delete a resource. 3. Use HTTP Status Codes: HTTP status codes provide information about the result of a request. Use appropriate status codes to indicate success, failure, or other relevant conditions. For example:\n200 OK: Successful GET request. 201 Created: Successful POST request. 404 Not Found: Resource not found. 500 Internal Server Error: Server error. 4. Use Proper Error Handling: Provide meaningful error messages and use standard error formats (such as JSON) to communicate errors to clients. Include relevant information, such as error codes and descriptions, to help developers troubleshoot issues.\n5. Versioning Consider versioning your API to ensure backward compatibility as it evolves over time. Use version numbers in the URI or headers to indicate different versions of the API.\nAs your API evolves over time, it’s important to consider versioning to maintain backward compatibility and provide a smooth transition for clients. Here are some common approaches to API versioning:\nURI Versioning: In URI versioning, the version number is included directly in the URI. For example: GET /v1/products GET /v2/productsURI versioning provides clear visibility of the API version but can lead to cluttered URIs and can be less flexible when it comes to deploying changes. Query Parameter Versioning: With query parameter versioning, the version number is included as a query parameter in the request. For example: GET /products?version=1 GET /products?version=2Query parameter versioning keeps URIs clean but may not be as intuitive for developers and can be prone to misuse. Header Versioning: Header versioning involves specifying the API version in a custom header field. For example: GET /products Headers: Accept-Version: 1 Header versioning keeps URIs clean and separates the versioning concern from the resource identifiers, but it may require additional effort from clients to specify the version in requests.\nEach versioning approach has its pros and cons, and the choice depends on factors such as the nature of your API, client requirements, and organizational preferences. Whichever approach you choose, it’s important to document the versioning strategy clearly and communicate any changes effectively to API consumers.\n6. Pagination and Filtering: For endpoints that return collections of resources, implement pagination to limit the number of results returned per request. Additionally, provide filtering options to allow clients to narrow down results based on specific criteria.\nTools and Frameworks Several tools and frameworks can assist in designing and implementing RESTful APIs:\nSwagger/OpenAPI: Swagger provides a specification for describing and documenting RESTful APIs. It allows you to define API endpoints, request/response formats, and authentication methods. Postman: Postman is a popular tool for testing and debugging APIs. It provides a user-friendly interface for sending requests, inspecting responses, and writing automated tests. Spring Boot: If you’re developing APIs in Java, Spring Boot offers a powerful framework for building RESTful services. It provides built-in support for defining endpoints, handling requests, and managing dependencies. Express.js: For Node.js applications, Express.js is a lightweight framework that simplifies the creation of RESTful APIs. It provides middleware for handling requests, routing, and error handling. Challenges and Considerations While designing RESTful APIs, you may encounter several challenges:\nSecurity: Ensure that your API endpoints are secure against common threats such as SQL injection, cross-site scripting (XSS), and cross-site request forgery (CSRF). Implement authentication and authorization mechanisms to control access to sensitive resources. Performance: Optimize your API for performance by minimizing latency, reducing payload sizes, and caching frequently accessed data. Consider using techniques such as content compression, response caching, and asynchronous processing to improve responsiveness. Scalability: Design your API to scale horizontally to handle increasing loads. Use techniques such as load balancing, sharding, and caching to distribute traffic evenly across multiple servers. HTTP Verbs and Nouns In RESTful API design, HTTP verbs (also known as methods) are used to perform actions on resources identified by URIs. Here’s how you can effectively use HTTP verbs:\nGET: Use the GET method to retrieve resource representations. GET requests should be idempotent, meaning they should not have any side effects on the server. For example: GET /products retrieves a list of products. GET /products/{id} retrieves a specific product. POST: Use the POST method to create new resources. POST requests should include the data for the new resource in the request body. For example: POST /products creates a new product with the provided data. PUT: Use the PUT method to update existing resources. PUT requests should contain the full representation of the resource being updated. For example: PUT /products/{id} updates an existing product with the provided data. DELETE: Use the DELETE method to remove resources. DELETE requests should remove the resource identified by the URI. For example: DELETE /products/{id} deletes the product with the specified ID. By adhering to these HTTP methods, you ensure that your API follows the principles of REST and provides a clear and consistent interface for interacting with resources.\nIdempotency Idempotency is a fundamental concept in computer science and plays a crucial role in designing and implementing robust and reliable systems, including RESTful APIs.\nExamples of Idempotent Operations: GET Requests: Retrieving data using the GET method is inherently idempotent because it only retrieves data and does not modify server state. No matter how many times you send the same GET request, you will receive the same response. PUT and DELETE Requests: The PUT and DELETE methods are idempotent because the result of performing these operations is the same regardless of how many times they are executed. For example, if you send a DELETE request to remove a resource, subsequent DELETE requests will not change the fact that the resource has already been deleted. Idempotent POST Requests: While POST requests are generally not considered idempotent because they create new resources, they can be designed to be idempotent in certain scenarios. For example, if a POST request includes an identifier for the resource being created and the server ensures that subsequent requests with the same identifier do not create duplicate resources, the POST operation becomes idempotent. Importance of Idempotency in RESTful APIs: Idempotency is critical in RESTful API design for several reasons:\nFault Tolerance: Idempotent operations help improve fault tolerance by ensuring that repeating requests due to network issues, timeouts, or failures do not lead to unexpected or unintended behavior. This is particularly important in distributed systems where communication between components may be unreliable. Safe Operations: Idempotent operations, such as GET requests, are considered safe because they do not alter server state. This allows clients to perform read-only operations without worrying about unintended side effects. Caching and Optimization: Idempotent operations are well-suited for caching and optimization purposes. Caching responses from idempotent requests can improve performance and reduce server load by serving cached responses to subsequent identical requests. Designing Idempotent APIs: When designing RESTful APIs, it’s important to consider how to make operations idempotent where appropriate:\nUse HTTP methods such as GET, PUT, and DELETE for operations that are inherently idempotent. For non-idempotent operations, such as creating resources with POST requests, consider including mechanisms to prevent duplication, such as using unique identifiers or checking for existing resources before creating new ones. Conclusion Designing APIs using REST specifications requires careful consideration of principles, best practices, and tools. By following the guidelines outlined in this post, you can create APIs that are scalable, reliable, and easy to use.\n","permalink":"https://learncodecamp.net/designing-api-using-rest-specification/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn today’s interconnected world of software development, designing robust and scalable APIs is crucial for building successful applications. Representational State Transfer (REST) has emerged as a dominant architectural style for designing networked applications. In this blog post, we’ll delve into the principles and best practices of designing APIs using REST specifications.\u003c/p\u003e\n\u003ch3 id=\"understanding-rest\"\u003eUnderstanding REST\u003c/h3\u003e\n\u003cp\u003eAt its core, REST is an architectural style that defines a set of constraints for creating web services. These constraints, outlined by Roy Fielding in his doctoral dissertation, emphasize scalability, simplicity, and reliability. The key principles of REST include:\u003c/p\u003e","title":"Designing APIs Using REST Specifications: A Comprehensive Guide"},{"content":"Introduction Apache Kafka is a highly scalable, distributed streaming platform designed to handle real-time data feeds. It has become a cornerstone of many big data and event streaming applications, thanks to its high throughput, fault tolerance, and scalability. This blog post aims to delve into the architecture of Kafka, covering its core components, how it works, and its applications in real-world scenarios.\nCore Components of Kafka Architecture Producers and Consumers At the heart of Kafka’s architecture are producers and consumers.\nProducers are the applications or machines that publish events or messages to Kafka topics. They can write messages to multiple topics and choose the partition to which the message is to be written based on various strategies, including the message’s properties or Kafka’s default partitioning strategy.\nConsumers, on the other hand, are the applications or machines that subscribe to topics and process the published message feeds. They read messages, using offsets to track which messages have been consumed. Consumers can operate as part of a group, enabling scalable and fault-tolerant processing of messages.\nTopics and Partitions Kafka organizes data into topics, which are categories or feeds of messages. Topics are divided into partitions to allow for parallelism and scalability. Each message within a partition is an ordered, immutable sequence of bytes. Producers write messages to specific partitions, and consumers read from these partitions. The choice of partition can affect the order in which messages are consumed, but Kafka ensures that messages within a partition are always read in order.\nBrokers Brokers are the nodes in Kafka that store the topics and partitions. Each broker can store multiple topics and partitions, and the data is distributed across all the brokers in a Kafka cluster. This distribution allows Kafka to handle high volumes of data and provide fault tolerance. If a broker fails, the data is not lost, as it is replicated across other brokers in the cluster.\nZooKeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses ZooKeeper to manage and coordinate the Kafka brokers. It is essential for Kafka’s operation, as it keeps track of the state of the cluster, such as which broker is the leader for each partition.\nIn newer versions of kakfa, zookeeper is not needed, Kafka uses raft (Raft Consensus Algorithm).\nReading Strategies in Kafka Kafka supports different reading strategies to ensure data integrity and performance:\nAt most once At-most-once semantics occur if the producer doesn’t retry on ack timeouts or errors. This might result in messages not being written to the Kafka topic and hence not delivered to the consumer. It’s a trade-off to avoid duplication, even though some messages might not make it through.\nAt least once At-least-once semantics mean a message is written once to the Kafka topic if the producer gets an acknowledgment (ack) from the broker with acks=all. But, if there’s a timeout or error, the producer might retry, risking duplicate messages if the broker failed after acknowledgment but before writing to the topic. This can lead to duplicated work and incorrect results, despite the good intention.\nExactly once Exactly-once semantics ensure that a message is delivered precisely once to the end consumer, even if the producer retries.\nTo read more about delivery semantics, check this blog : Exactly-once Semantics is Possible: Here’s How Apache Kafka Does it (confluent.io)\nKafka’s Applications and Adoption Kafka’s architecture and features make it suitable for a wide range of applications, from real-time analytics and monitoring to log aggregation and event sourcing. Conclusion Apache Kafka’s architecture, centered around producers, consumers, topics, partitions, brokers, and ZooKeeper, enables it to handle massive volumes of data in real-time. Its support for different reading strategies ensures data integrity and performance, making it a powerful tool for big data and event streaming applications\n","permalink":"https://learncodecamp.net/understanding-kakfa-architecture/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eApache Kafka is a highly scalable, distributed streaming platform designed to handle real-time data feeds. It has become a cornerstone of many big data and event streaming applications, thanks to its high throughput, fault tolerance, and scalability. This blog post aims to delve into the architecture of Kafka, covering its core components, how it works, and its applications in real-world scenarios.\u003c/p\u003e\n\u003ch3 id=\"core-components-of-kafka-architecture\"\u003eCore Components of Kafka Architecture\u003cfigure\u003e\u003c/h3\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"545\" src=\"/wp-content/uploads/2024/02/image-11-1024x545.png\" alt=\"\" /\u003e \u003c/figure\u003e\u003c/p\u003e\n\u003ch4 id=\"producers-and-consumers\"\u003eProducers and Consumers\u003c/h4\u003e\n\u003cp\u003eAt the heart of Kafka’s architecture are producers and consumers.\u003c/p\u003e","title":"Understanding Kafka Architecture: A Comprehensive Guide"},{"content":"In today’s software development landscape, working with multiple Java Development Kit (JDK) versions is not uncommon. Different projects may require different JDK versions due to compatibility issues, language features, or performance optimizations. Managing these JDK versions manually can be cumbersome and error-prone. That’s where SDKMAN! comes to the rescue!\nWhat is SDKMAN!? SDKMAN! is a tool that simplifies managing multiple software development kits (SDKs) on your system. It provides a convenient way to install, manage, and switch between different versions of JDKs, JVMs, build tools, and more. With SDKMAN!, you can effortlessly handle different JDK versions without the hassle of manual installation and configuration.\nGetting Started with SDKMAN! Before diving into managing JDK versions, let’s first set up SDKMAN! on your system. Follow these steps to get started:\nStep 1: Installation Just launch a new terminal and type in:\n$ curl -s \u0026#34;https://get.sdkman.io\u0026#34; | bash Follow the on-screen instructions to wrap up the installation. Afterward, open a new terminal or run the following in the same shell:\n$ source \u0026#34;$HOME/.sdkman/bin/sdkman-init.sh\u0026#34; Lastly, run the following snippet to confirm the installation’s success:\n$ sdk version or run\nsdk help sdk - The command line interface (CLI) for SDKMAN! SYNOPSIS sdk subcommand\u0026gt; [candidate] [version] DESCRIPTION SDKMAN! is a tool for managing parallel versions of multiple JVM related Software Development Kits on most Unix based systems. It provides a convenient Command Line Interface (CLI) and API for installing, switching, removing and listing Candidates. SUBCOMMANDS \u0026amp; QUALIFIERS help [subcommand] install candidate\u0026gt; [version] [path] uninstall candidate\u0026gt; version\u0026gt; list [candidate] use candidate\u0026gt; version\u0026gt; config no qualifier default candidate\u0026gt; [version] home candidate\u0026gt; version\u0026gt; env [init|install|clear] current [candidate] upgrade [candidate] version no qualifier offline [enable|disable] selfupdate [force] update no qualifier flush [tmp|metadata|version] EXAMPLES sdk install java 17.0.0-tem sdk help install Step 2: Installing JDKs Now that SDKMAN! is set up, you can start installing JDK versions. Use the following command to list available JDK candidates:\nsdk list java This command will display a list of JDK versions that you can install using SDKMAN!. Choose the desired version and install it by running:\nsdk install java version\u0026gt; Replace \u0026lt;version\u0026gt; with the JDK version you want to install, such as 11.0.13-zulu or 16.0.2-open.\nStep 3: Switching JDK Versions Once you have multiple JDK versions installed, you can easily switch between them using SDKMAN!. Use the following command to list installed JDK versions:\nsdk list java Identify the version you want to use and set it as the default by running:\nsdk default java version\u0026gt; Replace \u0026lt;version\u0026gt; with the desired JDK version.\nYou can switch to a specific JDK every time you open a project, This can be done through .sdkmanrc file in the base directory of your project. This file can be generated automatically by issuing the following command:\nsdk env init A config file with the following content has now been created in the current directory:\n# Enable auto-env through the sdkman_auto_env config # Add key=value pairs of SDKs to use below java=21.0.2-tem The file is pre-populated with the current JDK version in use but can contain as many key-value pairs of supported SDKs as needed. To switch to the configuration present in your .sdkmanrc file, simply issue the following command:\nsdk env You will get an output that looks like this:\nUsing java version 21.0.2-tem in this shell. Your path has now also been updated to use any of these SDKs in your current shell. When leaving a project, you may want to reset the SDKs to their default version. This can be done with this command\n$ sdk env clear Restored java version to 21.0.2-tem (default) To read more about sdkman check the official website: Home – SDKMAN! the Software Development Kit Manager\n","permalink":"https://learncodecamp.net/managing-multiple-jdk-version/","summary":"\u003cp\u003eIn today’s software development landscape, working with multiple Java Development Kit (JDK) versions is not uncommon. Different projects may require different JDK versions due to compatibility issues, language features, or performance optimizations. Managing these JDK versions manually can be cumbersome and error-prone. That’s where SDKMAN! comes to the rescue!\u003c/p\u003e\n\u003ch3 id=\"what-is-sdkman\"\u003eWhat is SDKMAN!?\u003c/h3\u003e\n\u003cp\u003eSDKMAN! is a tool that simplifies managing multiple software development kits (SDKs) on your system. It provides a convenient way to install, manage, and switch between different versions of JDKs, JVMs, build tools, and more. With SDKMAN!, you can effortlessly handle different JDK versions without the hassle of manual installation and configuration.\u003c/p\u003e","title":"Managing Multiple JDK Versions with SDKMAN!"},{"content":"Introduction Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT and then feeds that output text back into subsequent prompts. This can help in cases where you want GPT to reason things out before answering, or brainstorm a plan before executing it.\nMulti-Step Prompting Technique We will use a 3-step prompt to write unit tests in Java\nExplain: Given a Java function, we ask GPT to explain what the function is doing and why. Plan: We ask GPT to plan a set of unit tests for the function. If the plan is too short, we ask GPT to elaborate with more ideas for unit tests. Execute: Finally, we instruct GPT to write unit tests that cover the planned cases. Prompts\nFor Step 1, we will use this prompt\nYou are a world-class Java developer with an eagle eye for unintended bugs and edge cases. You carefully explain code with great detail and accuracy. You organize your explanations in markdown-formatted, bulleted lists. Code pasted here\u0026gt; For Step 2, we will use this prompt,\nA good unit test suite should aim to: Test the function\u0026#39;s behavior for a wide range of possible inputs. Test edge cases that the author may not have foreseen. Be easy to read and understand, with clean code and descriptive names. Be deterministic, so that the tests always pass or fail in the same way. To help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets). For Step 3, we will use this prompt,\nNow write test cases in junit 5 for all of the above points Try this method for generating test cases. This 3 step approach performed a lot better than the normal way of prompting, in which we just say, write unit test cases for this function.\nGive it a try, and let me know in the comments section if this worked for you or not.\nThis technique is inspired from Open AI cookbook samples: openai-cookbook/examples/Unit_test_writing_using_a_multi-step_prompt.ipynb at main · openai/openai-cookbook (github.com)\n","permalink":"https://learncodecamp.net/writing-test-cases-with-github-copilot/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eComplex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT and then feeds that output text back into subsequent prompts. This can help in cases where you want GPT to reason things out before answering, or brainstorm a plan before executing it.\u003c/p\u003e\n\u003ch3 id=\"multi-step-prompting-technique\"\u003eMulti-Step Prompting Technique\u003c/h3\u003e\n\u003cp\u003eWe will use a 3-step prompt to write unit tests in Java\u003c/p\u003e","title":"Writing Test Cases with Github Copilot"},{"content":"Introduction A few days ago, someone asked me has Java 8 reached the end of its life, and should we upgrade to Java 11 or Java 17. If you have the same question, let’s try to understand the situation, and do you need an upgrade or you can wait?\nYou should consider upgrading to a newer version of Java to ensure security, performance, and ongoing support, but Java 8 has a complex End-of-Life (EOL) situation.\nFirst, let’s understand there are multiple JDK vendors. The primary differences among JDK vendors lie in their support models, additional features, performance optimizations, and licensing terms\nJDK Vendors Let’s explore some of the key JDK vendors and their differences:\nOracle JDK: The original Java developer, historically most popular. Now subscription-based for commercial support; free updates for personal/development use are time-limited. Includes tools like Flight Recorder and Mission Control. OpenJDK: Open-source Java SE reference implementation. Community-driven (Oracle, Red Hat, IBM, etc.), providing a free alternative. Many JDK vendors base their distributions on OpenJDK. AdoptOpenJDK: Community-led effort for pre-built OpenJDK binaries across platforms. Provides LTS and interim releases, choice of HotSpot or OpenJ9 JVMs, with timely updates and community support. Amazon Corretto: Free, production-ready OpenJDK distribution by Amazon. Long-term support, security patches, optimized for AWS. Includes additional AWS-specific monitoring and diagnostic tools. Red Hat OpenJDK: OpenJDK build by Red Hat, integrated with RHEL. Targets enterprises with its commercial support and focus on stability and long-term use. Now let’s come to EOL\nPublic Updates: Oracle JDK 8: Oracle ended public updates for commercial use of Java 8 in January 2019. Public updates for personal use continue indefinitely. OpenJDK 8: OpenJDK 8 still receives some security and bug fixes through various vendors, but these updates may have varying end dates. Long-Term Support (LTS):\nOracle JDK 8: Oracle provides commercial Long-Term Support (LTS) for Java 8 until at least December 2030. Other Vendors: Several vendors offer their own extended support for OpenJDK 8, often with longer support periods than Oracle. Some popular options include: What does this mean for you? New Projects: It’s strongly recommended to use a newer Java version (like Java 11 or Java 17, which are also LTS releases) for new development. These versions have significant performance improvements, security enhancements, and modern language features. Important Considerations Using outdated Java versions exposes you to potential security risks and compatibility issues. Consult the specific support policies of the JDK distribution you’re using to confirm its end-of-life dates. If you are using OpenJDK 8 EOL is Nov 2026 as of now. For more details check this : OpenJDK Life Cycle and Support Policy – Red Hat Customer Portal\nIf you are using Oracle JDK 8 check this: https://endoflife.date/oracle-jdk Premier Support has ended on 31 Mar 2022, but Extended support is available till 31 Dec 2030\n","permalink":"https://learncodecamp.net/java-8-end-of-life/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eA few days ago, someone asked me has Java 8 reached the end of its life, and should we upgrade to Java 11 or Java 17. If you have the same question, let’s try to understand the situation, and do you need an upgrade or you can wait?\u003c/p\u003e\n\u003cp\u003eYou should consider upgrading to a newer version of Java to ensure security, performance, and ongoing support, but Java 8 has a complex End-of-Life (EOL) situation.\u003c/p\u003e","title":"Java 8 End of Life (EOL)?"},{"content":"Introduction In the world of software development, frameworks like Spring have streamlined the process of building complex applications. Spring’s fundamental pillar is its Inversion of Control (IoC) container. This container masterfully manages the lifecycles of your application components known as beans. A vital aspect of this management is understanding bean scopes.\nWhat Exactly are Bean Scopes? Bean scopes are blueprints that dictate how the Spring container creates and manages bean instances within your application. Let’s break it down:\nScope: Defines the lifespan and visibility of a bean instance. Spring Container: The heart of the Spring framework, responsible for bean creation and dependency injection. Types of Spring Bean Scopes Spring offers several bean scopes. Let’s explore the most common ones:\nSingleton Scope (Default) The container creates a single, shared instance of the bean. All requests for this bean receive the same instance. Best suited for stateless beans (e.g., services, DAOs). Prototype Scope The container creates a brand-new bean instance for each request. Ideal for stateful beans where each request needs a unique instance. Request Scope A new bean instance is created for every HTTP request. The same instance is used throughout a single request but discarded at its end. Perfect for web-aware beans within web applications. Session Scope One bean instance per HTTP session. Useful for storing user-specific data during a session. Application Scope A single bean instance exists throughout the lifespan of the entire Servlet context. It’s similar to a singleton but specifically tied to the web application’s context. The main difference lies in their context. Application scope is bound to a web application’s ServletContext, while the singleton scope is bound to a Spring IoC container. This means that in a web application with multiple Spring containers, you could have multiple instances of a singleton bean (one per container). Choosing the Right Bean Scope Deciding which scope to use is essential for building a robust Spring application. Consider the following:\nStatefulness: Do you need each request to have its own unique bean instance (prototype) or can multiple requests share the same instance (singleton)? Context: Are you working in a web environment (request, session) or a broader application context (application)? Performance: Creating new beans can have overhead. Singletons optimize performance in situations where the same bean instance can be reused often. Annotating Your Beans\nYou can specify scopes using annotations:\n@Scope(\u0026#34;singleton\u0026#34;) @Scope(\u0026#34;prototype\u0026#34;) @Scope(\u0026#34;request\u0026#34;) @Scope(\u0026#34;session\u0026#34;) @Scope(\u0026#34;application\u0026#34;) Example Java @Component @Scope(\u0026#34;prototype\u0026#34;) // New instance for each request public class ShoppingCart { // ... } Summary Mastering Spring bean scopes gives you incredible control over the behavior and state management of your application components. Remember:\nChoose the correct scope based on your bean’s nature and application requirements. Use scopes judiciously for optimal performance and memory usage. ","permalink":"https://learncodecamp.net/beans-spring/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn the world of software development, frameworks like Spring have streamlined the process of building complex applications. Spring’s fundamental pillar is its Inversion of Control (IoC) container. This container masterfully manages the lifecycles of your application components known as beans. A vital aspect of this management is understanding bean scopes.\u003c/p\u003e\n\u003ch3 id=\"what-exactly-are-bean-scopes\"\u003eWhat Exactly are Bean Scopes?\u003c/h3\u003e\n\u003cp\u003eBean scopes are blueprints that dictate how the Spring container creates and manages bean instances within your application. Let’s break it down:\u003c/p\u003e","title":"Understanding Spring Bean Scopes: A Key to Effective Bean Management"},{"content":"Introduction In the realm of networking, Anycast IP addresses represent a fascinating concept that has become increasingly popular due to its ability to enhance performance, scalability, and reliability of services across distributed networks. This blog post aims to delve into the intricacies of Anycast, exploring its underlying principles, mechanics, use cases, and considerations for implementation.\nWhat is Anycast? Anycast is a networking technique that allows multiple servers to advertise the same IP address. When a client sends a request to this shared IP address, the routing infrastructure directs the request to the nearest or best-performing server based on network conditions such as proximity, latency, or routing metrics.\nGeoDNS routes users to unique endpoints based on their location, whereas anycast routing routes traffic to the optimal data center determined by the number of hops, round-trip time, and amount of available bandwidth GeoDNS (upper) vs. Anycast Routing (lower)\nHow Does Anycast Work? Advertisement: Servers hosting the same service announce the Anycast IP address via routing protocols such as BGP (Border Gateway Protocol). These advertisements propagate across the internet, informing routers about the availability of the service at multiple locations. Routing Decision: When a client sends a request to the Anycast IP address, routers along the path analyze routing tables to determine the optimal path to reach the destination. They choose the nearest server based on factors like network topology and routing metrics. Delivery: The router forwards the client’s request to the chosen server, which processes the request and sends back the response. Since the client perceives only the Anycast IP address, the underlying routing infrastructure handles the redirection seamlessly. Use Cases of Anycast: Content Delivery Networks (CDNs): CDNs utilize Anycast to distribute content closer to end-users, reducing latency and improving load times for websites, videos, and other online resources. DNS Resolution: Anycast is employed in DNS (Domain Name System) servers to enhance the availability and performance of DNS resolution services. Multiple DNS servers advertise the same IP address, ensuring redundancy and efficient query resolution. Distributed Services: Anycast is beneficial for deploying highly available and scalable services across geographically dispersed locations. Examples include distributed databases, messaging systems, and API gateways. Advantages of Anycast: Improved Performance: Anycast routes clients to the nearest server, reducing latency and improving response times, especially for latency-sensitive applications. Enhanced Scalability: Anycast enables horizontal scaling by distributing traffic across multiple server locations, accommodating increased demand without overloading individual servers. Fault Tolerance: Anycast enhances fault tolerance by automatically redirecting traffic to alternate servers in case of server failures or network issues, ensuring continuous service availability. Challenges and Considerations: Convergence Time: Anycast routing may incur longer convergence times compared to unicast routing, especially when re-routing traffic in response to network changes or failures. Network Complexity: Implementing Anycast requires careful network planning, configuration, and monitoring to ensure optimal routing and mitigate potential routing loops or black-holing scenarios. Stateful Services: Anycast may pose challenges for stateful services that require session persistence or data synchronization between servers. Careful design and implementation are necessary to maintain data consistency and session affinity. Conclusion Anycast IP addresses offer a powerful mechanism for improving the performance, scalability, and fault tolerance of distributed services on the internet.\nBy leveraging Anycast, organizations can deliver faster, more resilient, and highly available services to their users worldwide.\nHowever, successful deployment requires careful consideration of network topology, routing policies, and application requirements to maximize the benefits while mitigating potential challenges.\n","permalink":"https://learncodecamp.net/anycast-ip/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn the realm of networking, Anycast IP addresses represent a fascinating concept that has become increasingly popular due to its ability to enhance performance, scalability, and reliability of services across distributed networks. This blog post aims to delve into the intricacies of Anycast, exploring its underlying principles, mechanics, use cases, and considerations for implementation.\u003c/p\u003e\n\u003cp\u003eWhat is Anycast? Anycast is a networking technique that allows multiple servers to advertise the same IP address. When a client sends a request to this shared IP address, the routing infrastructure directs the request to the nearest or best-performing server based on network conditions such as proximity, latency, or routing metrics.\u003c/p\u003e","title":"Understanding Anycast IP Addresses: How They Work and When to Use Them"},{"content":"Introduction Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Whether it’s natural language processing, computer vision, recommender systems, or other applications, embeddings play a crucial role in enhancing model performance and scalability.\nText embeddings measure the relatedness of text strings. Embeddings are commonly used for:\nSearch (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity) Recommendations (where items with related text strings are recommended) Anomaly detection (where outliers with little relatedness are identified) Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label) Embedding vector from a string\nGroup of strings in vector space, showing similar sentences are grouped closely\nThere are a lot of models that we can use, some are free and some are paid and available via API. In this blog, we will see one example of both.\nFirst, let’s check the Open AI Embeddings.\nOpen AI Embeddings curl https://api.openai.com/v1/embeddings \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -H \u0026#34;Authorization: Bearer $OPENAI_API_KEY\u0026#34; \\ -d \u0026#39;{ \u0026#34;input\u0026#34;: \u0026#34;Your text string goes here\u0026#34;, \u0026#34;model\u0026#34;: \u0026#34;text-embedding-3-small\u0026#34; }\u0026#39; { \u0026#34;object\u0026#34;: \u0026#34;list\u0026#34;, \u0026#34;data\u0026#34;: [ { \u0026#34;object\u0026#34;: \u0026#34;embedding\u0026#34;, \u0026#34;index\u0026#34;: 0, \u0026#34;embedding\u0026#34;: [ -0.006929283495992422, -0.005336422007530928, ... (omitted for spacing) -4.547132266452536e-05, -0.024047505110502243 ], } ], \u0026#34;model\u0026#34;: \u0026#34;text-embedding-3-small\u0026#34;, \u0026#34;usage\u0026#34;: { \u0026#34;prompt_tokens\u0026#34;: 5, \u0026#34;total_tokens\u0026#34;: 5 } } the length of the embedding vector will be 1536 for text-embedding-3-small or 3072 for text-embedding-3-large.\nSbert Embeddings SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.\nInstall the sentence-transformers with pip:\npip install -U sentence-transformers from sentence_transformers import SentenceTransformer model = SentenceTransformer(\u0026#34;all-MiniLM-L6-v2\u0026#34;) # Our sentences we like to encode sentences = [ \u0026#34;This framework generates embeddings for each input sentence\u0026#34;, \u0026#34;Sentences are passed as a list of strings.\u0026#34;, \u0026#34;The quick brown fox jumps over the lazy dog.\u0026#34;, ] # Sentences are encoded by calling model.encode() embeddings = model.encode(sentences) # Print the embeddings for sentence, embedding in zip(sentences, embeddings): print(\u0026#34;Sentence:\u0026#34;, sentence) print(\u0026#34;Embedding:\u0026#34;, embedding) print(\u0026#34;\u0026#34;) You can wrap this code in Django server code, and serve as an API.\nSeveral prominent embedding models exist, comprising:\nWord2Vec: Engineered by Google, this model leverages neural networks to glean word embeddings from extensive textual datasets. GloVe: Forged by Stanford, this acronym signifies “Global Vectors for Word Representation,” employing a fusion of matrix factorization and co-occurrence statistics to derive word embeddings. FastText: Crafted by Facebook, this model resembles Word2Vec but also integrates subword information (e.g., character n-grams) to formulate word embeddings. ELMO (Embeddings from Language Models): Engineered by AllenNLP, ELMO employs a deep bidirectional language model to craft embeddings tailored for specific tasks through fine-tuning. To know more about Sbert check this link : SentenceTransformers Documentation — Sentence-Transformers documentation (sbert.net)\nTo know more about OpenAI Embeddings check this link : Embeddings – OpenAI API\nLets see the clustering in action with the following code sample, using Sbert\nimport matplotlib.pyplot as plt from sklearn.manifold import TSNE from sentence_transformers import SentenceTransformer # Initialize the Sentence Transformer model model = SentenceTransformer(\u0026#34;all-MiniLM-L6-v2\u0026#34;) #t-SNE stands for t-Distributed Stochastic Neighbor Embedding. # It\u0026#39;s a dimensionality reduction technique commonly used for visualizing high-dimensional data in a lower-dimensional space, # typically 2D or 3D. t-SNE is particularly effective at preserving the local structure of the data, # making it useful for visualizing clusters or patterns in complex datasets. # Our sentences to encode sentences = [ \u0026#34;cat\u0026#34;, \u0026#34;dog\u0026#34;, \u0026#34;elephant\u0026#34;, \u0026#34;lion\u0026#34;, \u0026#34;tiger\u0026#34;, # Animals \u0026#34;swim\u0026#34;, \u0026#34;dive\u0026#34;, \u0026#34;surf\u0026#34;, \u0026#34;boat\u0026#34;, \u0026#34;ocean\u0026#34;, # Water-related \u0026#34;run\u0026#34;, \u0026#34;jump\u0026#34;, \u0026#34;athlete\u0026#34;, \u0026#34;sports\u0026#34;, \u0026#34;fitness\u0026#34;, # Athlete-related \u0026#34;code\u0026#34;, \u0026#34;programming\u0026#34;, \u0026#34;developer\u0026#34;, \u0026#34;software\u0026#34;, \u0026#34;computer\u0026#34;, # Coding-related \u0026#34;fish\u0026#34;, \u0026#34;shark\u0026#34;, \u0026#34;whale\u0026#34;, \u0026#34;dolphin\u0026#34;, \u0026#34;octopus\u0026#34;, # Marine animals \u0026#34;sprint\u0026#34;, \u0026#34;marathon\u0026#34;, \u0026#34;exercise\u0026#34;, \u0026#34;gym\u0026#34;, \u0026#34;race\u0026#34;, # Exercise-related \u0026#34;data\u0026#34;, \u0026#34;algorithm\u0026#34;, \u0026#34;python\u0026#34;, \u0026#34;java\u0026#34;, \u0026#34;javascript\u0026#34;, # Programming languages \u0026#34;bird\u0026#34;, \u0026#34;eagle\u0026#34;, \u0026#34;falcon\u0026#34;, \u0026#34;penguin\u0026#34;, \u0026#34;parrot\u0026#34;, # Birds \u0026#34;swimmer\u0026#34;, \u0026#34;diver\u0026#34;, \u0026#34;surfer\u0026#34;, \u0026#34;sailor\u0026#34;, \u0026#34;fisherman\u0026#34;, # Water-related roles \u0026#34;soccer\u0026#34;, \u0026#34;basketball\u0026#34;, \u0026#34;football\u0026#34;, \u0026#34;tennis\u0026#34;, \u0026#34;volleyball\u0026#34;, # Sports \u0026#34;debug\u0026#34;, \u0026#34;compile\u0026#34;, \u0026#34;function\u0026#34;, \u0026#34;variable\u0026#34;, \u0026#34;loop\u0026#34;, # Coding terms \u0026#34;horse\u0026#34;, \u0026#34;zebra\u0026#34;, \u0026#34;giraffe\u0026#34;, \u0026#34;cheetah\u0026#34;, \u0026#34;kangaroo\u0026#34;, # Safari animals \u0026#34;drown\u0026#34;, \u0026#34;float\u0026#34;, \u0026#34;wave\u0026#34;, \u0026#34;current\u0026#34;, \u0026#34;splash\u0026#34;, # Water actions \u0026#34;runner\u0026#34;, \u0026#34;cyclist\u0026#34;, \u0026#34;swimmer\u0026#34;, \u0026#34;jumper\u0026#34;, \u0026#34;climber\u0026#34;, # Athlete roles \u0026#34;database\u0026#34;, \u0026#34;API\u0026#34;, \u0026#34;framework\u0026#34;, \u0026#34;library\u0026#34;, \u0026#34;interface\u0026#34;, # Software terms \u0026#34;frog\u0026#34;, \u0026#34;turtle\u0026#34;, \u0026#34;crocodile\u0026#34;, \u0026#34;alligator\u0026#34;, \u0026#34;lizard\u0026#34;, # Reptiles \u0026#34;scuba\u0026#34;, \u0026#34;surfboard\u0026#34;, \u0026#34;paddle\u0026#34;, \u0026#34;kayak\u0026#34;, \u0026#34;snorkel\u0026#34;, # Water equipment \u0026#34;gymnast\u0026#34;, \u0026#34;weightlifter\u0026#34;, \u0026#34;skater\u0026#34;, \u0026#34;surfer\u0026#34;, \u0026#34;dancer\u0026#34; # Athletic roles ] # Sentences are encoded by calling model.encode() embeddings = model.encode(sentences) # Perform dimensionality reduction for visualization using t-SNE tsne = TSNE(n_components=2, perplexity=5, random_state=42) # Adjust perplexity here embeddings_2d = tsne.fit_transform(embeddings) # Visualize embeddings in 2D plt.figure(figsize=(10, 6)) plt.title(\u0026#34;t-SNE Visualization of Embeddings (2D)\u0026#34;) plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1]) for i, sentence in enumerate(sentences): plt.annotate(sentence, (embeddings_2d[i, 0], embeddings_2d[i, 1])) plt.xlabel(\u0026#34;Dimension 1\u0026#34;) plt.ylabel(\u0026#34;Dimension 2\u0026#34;) plt.grid(True) plt.show() Vector databases serve as robust repositories for storing embeddings. By leveraging specialized data structures and indexing techniques, vector databases facilitate seamless integration with applications requiring semantic understanding and context-aware processing.\nThese databases are optimized for efficient storage and retrieval of high-dimensional vectors, enabling fast querying and similarity searches.\nWe will learn about Vector databases in another blog.\n","permalink":"https://learncodecamp.net/embeddings/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eEmbeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. \u003c/p\u003e\n\u003cp\u003eWhether it’s natural language processing, computer vision, recommender systems, or other applications, embeddings play a crucial role in enhancing model performance and scalability.\u003c/p\u003e\n\u003cp\u003eText embeddings measure the relatedness of text strings. Embeddings are commonly used for:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSearch\u003c/strong\u003e (where results are ranked by relevance to a query string)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eClustering\u003c/strong\u003e (where text strings are grouped by similarity)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRecommendations\u003c/strong\u003e (where items with related text strings are recommended)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eAnomaly detection\u003c/strong\u003e (where outliers with little relatedness are identified)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eDiversity measurement\u003c/strong\u003e (where similarity distributions are analyzed)\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eClassification\u003c/strong\u003e (where text strings are classified by their most similar label)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cfigure\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"928\" height=\"200\" src=\"/wp-content/uploads/2024/02/image-4.png\" alt=\"\" /\u003e \u003cfigcaption class=\"wp-element-caption\"\u003eEmbedding vector from a string\u003c/figcaption\u003e\u003c/figure\u003e\u003c/p\u003e","title":"Understanding Embeddings"},{"content":"Introduction In the realm of distributed systems, robust communication between microservices is paramount. Kafka, with its high throughput and fault-tolerant design, has become a go-to solution for building scalable messaging systems.\nHowever, ensuring message reliability in asynchronous communication can be challenging, especially when dealing with failures and errors. One approach to handle such scenarios is the use of a Dead Letter Queue (DLQ), which acts as a safety net for messages that couldn’t be processed successfully on their initial attempt.\nIn this blog post, we’ll explore the concept of Kafka Spring Dead Letter Queue and see how to implement it.\nUnderstanding Kafka Spring Dead Letter Queue A Dead Letter Queue (DLQ) is a special queue where messages that fail to be processed are sent. In the context of Kafka and Spring, the Dead Letter Queue is an invaluable feature that enhances the resilience of message-driven applications. When a consumer encounters an error while processing a message from a Kafka topic, instead of discarding the message outright, it can be redirected to a designated DLQ for further analysis or processing.\nAchieving non-blocking retry and DLT functionality with Kafka usually requires setting up extra topics and creating and configuring the corresponding listeners.\nErrors trickle down levels of retry topics until landing in the DLT:\nIf message processing fails, the message is forwarded to a retry topic with a back off timestamp. The retry topic consumer then checks the timestamp and if it’s not due it pauses the consumption for that topic’s partition. When it is due the partition consumption is resumed, and the message is consumed again. If the message processing fails again the message will be forwarded to the next retry topic, and the pattern is repeated until a successful processing occurs, or the attempts are exhausted, If all retry attempts are exhausted the message is sent to the Dead Letter Topic for visibility and diagnosis. Dead letter Topic messages can be reprocessed by being published back into the first retry topic. This way, they have no influence of the live traffic. Non-Blocking Retries in Spring Kafka Since Spring Kafka 2.7.0 failed deliveries can be forwarded to a series of topics for delayed redelivery.\nIt can described with an example:\npublic class RetryableKafkaListener { @RetryableTopic( attempts = \u0026#34;4\u0026#34;, backoff = @Backoff(delay = 1000, multiplier = 2.0), autoCreateTopics = \u0026#34;false\u0026#34;, topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_INDEX_VALUE) @KafkaListener(topics = \u0026#34;orders\u0026#34;) public void listen(String in, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) { log.info(in + \u0026#34; from \u0026#34; + topic); throw new RuntimeException(\u0026#34;test\u0026#34;); } @DltHandler public void dlt(String in, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) { log.info(in + \u0026#34; from \u0026#34; + topic); } } With this @RetryableTopic configuration, the first delivery attempt fails and the record is sent to a topic order-retry-0 configured for a 1-second delay.\nWhen that delivery fails, the record is sent to a topic order-retry-1 with a 2-second delay.\nWhen that delivery fails, it goes to a topic order-retry-2 with a 4-second delay, and, finally, to a dead letter topic orders-dlt handled by @DltHandler method.\nKafka Spring Dead Letter Queue is a powerful mechanism for handling message processing failures gracefully. By redirecting erroneous messages to a separate queue, it provides developers with the opportunity to analyze and remediate issues without losing valuable data. Incorporating DLQ into your Kafka-based applications ensures greater resilience and reliability in asynchronous messaging systems.\n","permalink":"https://learncodecamp.net/spring-kafka-dead-letter-queue/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn the realm of distributed systems, robust communication between microservices is paramount. Kafka, with its high throughput and fault-tolerant design, has become a go-to solution for building scalable messaging systems.\u003c/p\u003e\n\u003cp\u003eHowever, ensuring message reliability in asynchronous communication can be challenging, especially when dealing with failures and errors. One approach to handle such scenarios is the use of a Dead Letter Queue (DLQ), which acts as a safety net for messages that couldn’t be processed successfully on their initial attempt.\u003c/p\u003e","title":"Leveraging Kafka Spring Dead Letter Queue for Resilient Messaging"},{"content":"SOLID is an acronym that represents a set of five design principles in object-oriented programming and software design. These principles aim to create more maintainable, flexible, and scalable software by promoting a modular and clean code structure. The SOLID principles were introduced by Robert C. Martin and have become widely adopted in the software development industry. Here’s a brief overview of each principle:\nSingle Responsibility Principle (SRP): A class should have only one reason to change, meaning that it should have only one responsibility or job. This principle encourages the separation of concerns and helps to ensure that a class is focused on doing one thing well. // Before SRP class Report { public void generateReport() { // code for generating the report } public void saveToFile() { // code for saving the report to a file } } // After SRP class Report { public void generateReport() { // code for generating the report } } class ReportSaver { public void saveToFile(Report report) { // code for saving the report to a file } } Open/Closed Principle (OCP): Software entities (classes, modules, functions, etc.) should be open for extension but closed for modification. This encourages developers to add new functionality through the creation of new classes or modules rather than altering existing ones. // Before OCP class Rectangle { public double width; public double height; } class AreaCalculator { public double calculateArea(Rectangle rectangle) { return rectangle.width * rectangle.height; } } // After OCP interface Shape { double calculateArea(); } class Rectangle implements Shape { private double width; private double height; // constructor and other methods @Override public double calculateArea() { return width * height; } } class Circle implements Shape { private double radius; // constructor and other methods @Override public double calculateArea() { return Math.PI * radius * radius; } } Liskov Substitution Principle (LSP): Subtypes should be substitutable for their base types without altering the correctness of the program. This principle ensures that objects of a derived class can be used in place of objects of the base class without affecting the program’s functionality. // Before LSP class Bird { public void fly() { // code for flying } } class Ostrich extends Bird { // Ostrich is a bird, but it can\u0026#39;t fly } // After LSP interface FlyingBird { void fly(); } class Sparrow implements FlyingBird { @Override public void fly() { // code for flying } } class Ostrich { // Ostrich doesn\u0026#39;t implement FlyingBird, as it can\u0026#39;t fly } Interface Segregation Principle (ISP): Clients should not be forced to depend on interfaces they do not use. It advocates for the creation of small, specific interfaces rather than large, general-purpose ones, to avoid clients being forced to implement methods they don’t need. // Before ISP interface Worker { void work(); void eat(); void sleep(); } class Engineer implements Worker { @Override public void work() { // code for working } @Override public void eat() { // code for eating } @Override public void sleep() { // code for sleeping } } // After ISP interface Workable { void work(); } interface Eatable { void eat(); } interface Sleepable { void sleep(); } class Engineer implements Workable, Eatable, Sleepable { @Override public void work() { // code for working } @Override public void eat() { // code for eating } @Override public void sleep() { // code for sleeping } } Dependency Inversion Principle (DIP): High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details; details should depend on abstractions. This principle promotes the use of abstractions (like interfaces or abstract classes) to decouple high-level and low-level modules, making the system more flexible and easier to change. // Before DIP class LightBulb { public void turnOn() { // code for turning on the light bulb } public void turnOff() { // code for turning off the light bulb } } class Switch { private LightBulb bulb; public Switch(LightBulb bulb) { this.bulb = bulb; } public void operate() { // code for operating the switch if (/* some condition */) { bulb.turnOn(); } else { bulb.turnOff(); } } } // After DIP interface Switchable { void turnOn(); void turnOff(); } class LightBulb implements Switchable { @Override public void turnOn() { // code for turning on the light bulb } @Override public void turnOff() { // code for turning off the light bulb } } class Switch { private Switchable device; public Switch(Switchable device) { this.device = device; } public void operate() { // code for operating the switch if (/* some condition */) { device.turnOn(); } else { device.turnOff(); } } } Adhering to SOLID principles can result in code that is easier to understand, maintain, and extend. These principles contribute to the overall goal of creating robust and scalable software systems.\n","permalink":"https://learncodecamp.net/solid-principles/","summary":"\u003cp\u003eSOLID is an acronym that represents a set of five design principles in object-oriented programming and software design. These principles aim to create more maintainable, flexible, and scalable software by promoting a modular and clean code structure. The SOLID principles were introduced by Robert C. Martin and have become widely adopted in the software development industry. Here’s a brief overview of each principle:\u003c/p\u003e\n\u003ch3 id=\"single-responsibility-principle-srp\"\u003e\u003cstrong\u003eSingle Responsibility Principle (SRP):\u003c/strong\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eA class should have only one reason to change, meaning that it should have only one responsibility or job.\u003c/li\u003e\n\u003cli\u003eThis principle encourages the separation of concerns and helps to ensure that a class is focused on doing one thing well.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Before SRP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eReport\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void generateReport() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e generating the report\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void saveToFile() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e saving the report to a file\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e After SRP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eReport\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void generateReport() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e generating the report\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eReportSaver\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void saveToFile(Report report) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e saving the report to a file\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch3 id=\"openclosed-principle-ocp\"\u003e\u003cstrong\u003eOpen/Closed Principle (OCP):\u003c/strong\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eSoftware entities (classes, modules, functions, etc.) should be open for extension but closed for modification.\u003c/li\u003e\n\u003cli\u003eThis encourages developers to add new functionality through the creation of new classes or modules rather than altering existing ones.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Before OCP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eRectangle\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public double width;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public double height;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eAreaCalculator\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public double calculateArea(Rectangle rectangle) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#66d9ef\"\u003ereturn\u003c/span\u003e rectangle\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003ewidth \u003cspan style=\"color:#f92672\"\u003e*\u003c/span\u003e rectangle\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eheight;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e After OCP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003einterface Shape {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    double calculateArea();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eRectangle\u003c/span\u003e implements Shape {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    private double width;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    private double height;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e constructor \u003cspan style=\"color:#f92672\"\u003eand\u003c/span\u003e other methods\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public double calculateArea() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#66d9ef\"\u003ereturn\u003c/span\u003e width \u003cspan style=\"color:#f92672\"\u003e*\u003c/span\u003e height;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eCircle\u003c/span\u003e implements Shape {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    private double radius;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e constructor \u003cspan style=\"color:#f92672\"\u003eand\u003c/span\u003e other methods\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public double calculateArea() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#66d9ef\"\u003ereturn\u003c/span\u003e Math\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003ePI \u003cspan style=\"color:#f92672\"\u003e*\u003c/span\u003e radius \u003cspan style=\"color:#f92672\"\u003e*\u003c/span\u003e radius;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch3 id=\"liskov-substitution-principle-lsp\"\u003e\u003cstrong\u003eLiskov Substitution Principle (LSP):\u003c/strong\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eSubtypes should be substitutable for their base types without altering the correctness of the program.\u003c/li\u003e\n\u003cli\u003eThis principle ensures that objects of a derived class can be used in place of objects of the base class without affecting the program’s functionality.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Before LSP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eBird\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void fly() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e flying\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eOstrich\u003c/span\u003e extends Bird {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Ostrich \u003cspan style=\"color:#f92672\"\u003eis\u003c/span\u003e a bird, but it can\u003cspan style=\"color:#e6db74\"\u003e\u0026#39;t fly\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e After LSP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003einterface FlyingBird {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    void fly();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eSparrow\u003c/span\u003e implements FlyingBird {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void fly() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e flying\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eOstrich\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Ostrich doesn\u003cspan style=\"color:#e6db74\"\u003e\u0026#39;t implement FlyingBird, as it can\u0026#39;\u003c/span\u003et fly\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch3 id=\"interface-segregation-principle-isp\"\u003e\u003cstrong\u003eInterface Segregation Principle (ISP):\u003c/strong\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eClients should not be forced to depend on interfaces they do not use.\u003c/li\u003e\n\u003cli\u003eIt advocates for the creation of small, specific interfaces rather than large, general-purpose ones, to avoid clients being forced to implement methods they don’t need.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-typescript\" data-lang=\"typescript\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#75715e\"\u003e// Before ISP\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003einterface\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eWorker\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003ework\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eeat\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003esleep\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eEngineer\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003eimplements\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eWorker\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003ework() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for working\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eeat() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for eating\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003esleep() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for sleeping\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#75715e\"\u003e// After ISP\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003einterface\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eWorkable\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003ework\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003einterface\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eEatable\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eeat\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003einterface\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eSleepable\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003esleep\u003c/span\u003e();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eEngineer\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003eimplements\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eWorkable\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003eEatable\u003c/span\u003e, \u003cspan style=\"color:#a6e22e\"\u003eSleepable\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003ework() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for working\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eeat() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for eating\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003epublic\u003c/span\u003e \u003cspan style=\"color:#66d9ef\"\u003evoid\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003esleep() {\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#75715e\"\u003e// code for sleeping\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch3 id=\"dependency-inversion-principle-dip\"\u003e\u003cstrong\u003eDependency Inversion Principle (DIP):\u003c/strong\u003e\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eHigh-level modules should not depend on low-level modules. Both should depend on abstractions.\u003c/li\u003e\n\u003cli\u003eAbstractions should not depend on details; details should depend on abstractions.\u003c/li\u003e\n\u003cli\u003eThis principle promotes the use of abstractions (like interfaces or abstract classes) to decouple high-level and low-level modules, making the system more flexible and easier to change.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e Before DIP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eLightBulb\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void turnOn() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e turning on the light bulb\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void turnOff() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e turning off the light bulb\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eSwitch\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    private LightBulb bulb;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public Switch(LightBulb bulb) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        this\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003ebulb \u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e bulb;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void operate() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e operating the switch\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#66d9ef\"\u003eif\u003c/span\u003e (\u003cspan style=\"color:#f92672\"\u003e/*\u003c/span\u003e some condition \u003cspan style=\"color:#f92672\"\u003e*/\u003c/span\u003e) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            bulb\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eturnOn();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        } \u003cspan style=\"color:#66d9ef\"\u003eelse\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            bulb\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eturnOff();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e After DIP\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003einterface Switchable {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    void turnOn();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    void turnOff();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eLightBulb\u003c/span\u003e implements Switchable {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void turnOn() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e turning on the light bulb\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#a6e22e\"\u003e@Override\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void turnOff() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e turning off the light bulb\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003eclass\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003eSwitch\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    private Switchable device;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public Switch(Switchable device) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        this\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003edevice \u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e device;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    public void operate() {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#f92672\"\u003e//\u003c/span\u003e code \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e operating the switch\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#66d9ef\"\u003eif\u003c/span\u003e (\u003cspan style=\"color:#f92672\"\u003e/*\u003c/span\u003e some condition \u003cspan style=\"color:#f92672\"\u003e*/\u003c/span\u003e) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            device\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eturnOn();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        } \u003cspan style=\"color:#66d9ef\"\u003eelse\u003c/span\u003e {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            device\u003cspan style=\"color:#f92672\"\u003e.\u003c/span\u003eturnOff();\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eAdhering to SOLID principles can result in code that is easier to understand, maintain, and extend. These principles contribute to the overall goal of creating robust and scalable software systems.\u003c/p\u003e","title":"SOLID principles"},{"content":"Introduction Logging is an essential aspect of software development, aiding in debugging, monitoring, and analyzing application behavior. In Spring applications, Aspect-Oriented Programming (AOP) offers a powerful mechanism to separate cross-cutting concerns like logging from the business logic. By employing AOP, developers can modularize logging code and apply it uniformly across multiple components, enhancing maintainability and readability.\nUnderstanding AOP in Spring Aspect-Oriented Programming enables the modularization of cross-cutting concerns by allowing developers to define aspects, which encapsulate certain behavior. In Spring, AOP is typically implemented using proxies and advice. Proxies intercept method invocations and execute advice either before, after, or around the method call.\nAdvantages of AOP for Logging: Modularization: AOP enables the separation of logging concerns from the core business logic, leading to cleaner and more maintainable code. Uniformity: With AOP, logging can be applied uniformly across multiple components without cluttering the source code with repetitive logging statements. Configurability: AOP allows developers to configure logging behavior dynamically, such as specifying different log levels or destinations for different parts of the application. Before delving deeper into logging with AOP in Spring, it’s essential to understand some fundamental concepts:\nAspect: An aspect encapsulates a cross-cutting concern or functionality that we want to apply throughout the application. In the context of logging, the logging behavior represents an aspect that cuts across different components and modules of the application.\nJoin Point: A join point is a point in the application’s flow where we want to apply an aspect. It represents a specific point during the execution of a program, such as method execution, method call, object instantiation, or exception handling.\nAdvice: Advice is the action that should be executed at a specific join point. In AOP, advice defines what needs to be done when a particular join point is reached. There are different types of advice, including @Before, @After, @Around, @AfterReturning, and @AfterThrowing, each specifying when the advice should be executed concerning the join point.\nPointcut: A pointcut is a collection of join points where an aspect should be applied. It defines a set of criteria for identifying join points in the application’s execution flow. Pointcuts use expressions to match join points based on method signatures, class names, annotations, or other criteria.\nImplementing Logging With AOP in Spring Here’s a step-by-step guide to implementing logging using AOP in a Spring application:\n1. Define Logging Aspect: Create a logging aspect class annotated with @Aspect, which contains advice methods annotated with @Before, @After, @Around, etc., depending on when you want the logging to occur.\n@Aspect\n@Component\npublic class LoggingAspect {\n@Before(\"execution(* com.example.*.*(..))\")\npublic void logBefore(JoinPoint joinPoint) {\n// Logging logic before method execution\n}\n@After(\"execution(* com.example.*.*(..))\")\npublic void logAfter(JoinPoint joinPoint) {\n// Logging logic after method execution\n}\n// Add other advice methods as needed\n}\n2. Configure Aspect in Spring Context: Register the logging aspect in the Spring application context to enable AOP functionality.\n@Configuration\n@EnableAspectJAutoProxy\npublic class AppConfig {\n// Configuration beans\n}\n3. Apply Logging Aspect: Apply the logging aspect to the target components using pointcut expressions.\n@Component\npublic class MyService {\npublic void doSomething() {\n// Business logic\n}\n}\n4. Test Logging Behavior: Verify that logging works as expected by executing the application and observing the log output.\nChallenges and Best Practices Performance Overhead: AOP can introduce performance overhead due to proxy creation and method interception. It’s essential to profile and optimize AOP-based logging for performance-critical applications. Avoiding Excessive Logging: Care should be taken not to over-log, which can clutter logs and degrade performance. Configure logging levels appropriately and use conditional logging to log only when necessary. Error Handling: Handle exceptions thrown within advice methods to prevent unexpected behavior in the application. Integration with Logging Frameworks: Spring AOP seamlessly integrates with popular logging frameworks like Logback, Log4j, or Java Util Logging, allowing developers to leverage their features and configurations. Conclusion: Logging is a critical aspect of software development, and using Aspect-Oriented Programming in Spring applications offers an elegant solution to modularize and manage logging concerns effectively. By employing AOP, developers can enhance maintainability, readability, and configurability of logging behavior across the application. However, it’s crucial to address challenges like performance overhead and excessive logging to ensure optimal application performance and usability.\nIncorporating AOP-based logging practices into Spring projects empowers developers to build robust and well-structured applications, facilitating easier debugging, monitoring, and maintenance throughout the software development lifecycle.\n","permalink":"https://learncodecamp.net/spring-boot-aop-logging/","summary":"\u003ch3 id=\"introduction\"\u003e\u003cstrong\u003eIntroduction\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eLogging is an essential aspect of software development, aiding in debugging, monitoring, and analyzing application behavior. In Spring applications, Aspect-Oriented Programming (AOP) offers a powerful mechanism to separate cross-cutting concerns like logging from the business logic. By employing AOP, developers can modularize logging code and apply it uniformly across multiple components, enhancing maintainability and readability.\u003c/p\u003e\n\u003ch3 id=\"understanding-aop-in-spring\"\u003e\u003cstrong\u003eUnderstanding AOP in Spring\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eAspect-Oriented Programming enables the modularization of cross-cutting concerns by allowing developers to define aspects, which encapsulate certain behavior. In Spring, AOP is typically implemented using proxies and advice. Proxies intercept method invocations and execute advice either before, after, or around the method call.\u003c/p\u003e","title":"Enhancing Logging in Spring Applications with Aspect-Oriented Programming (AOP)"},{"content":" jps (Java Process Status) Usage: Sample Usage: Understanding the Output: jstat (JVM Statistics Monitoring Tool) Usage: Sample Usage: Understanding the Output: jcmd (JVM Diagnostic Command) Usage: Sample Usage: Understanding the Output: jmap (Memory Map for Java) Usage: Sample Usage: Understanding the Output: Conclusion Java applications often run in complex and dynamic environments, making it essential to monitor their performance and diagnose issues efficiently. Fortunately, the Java Development Kit (JDK) comes with a set of powerful tools for this purpose. In this guide, we will explore four essential tools: jps, jstat, jcmd, and jmap. We’ll discuss their functionalities, sample usage, and how to interpret their output effectively.\n1. jps (Java Process Status) The jps tool lists Java Virtual Machine (JVM) processes on the local machine. It provides information such as the process ID (PID) and the main class or JAR file being executed.\nUsage: jps [options] Sample Usage: $ jps -l\n12345 com.example.MainApp Understanding the Output: The first column represents the PID. The second column displays the fully qualified class name or JAR file name of the main class. 2. jstat (JVM Statistics Monitoring Tool) jstat is a command-line tool that provides information on JVM internal statistics such as garbage collection, class loading, compiler activity, and more.\nUsage: jstat [options] [ []]\nSample Usage: $ jstat -gcutil 12345 1000 10\nUnderstanding the Output: The output varies depending on the options used. -gcutil provides garbage collection statistics. Columns represent different metrics like S0, S1 (survivor space), EC (eden space), EU (used eden space), OC (old space), OU (used old space), MC (metaspace), MU (used metaspace), etc. 3. jcmd (JVM Diagnostic Command) jcmd is a versatile tool that can perform various operations on JVM processes, including thread dumps, heap dumps, GC operations, and more.\nUsage: jcmd [] Sample Usage: $ jcmd 12345 Thread.print Understanding the Output: The output depends on the command used. For example, Thread.print prints thread stack traces. It provides valuable insights into thread activities, including deadlock detection and monitoring. 4. jmap (Memory Map for Java) jmap generates memory-related information for a given Java process, including heap dumps and memory usage statistics.\nUsage: jmap [option] Sample Usage: $ jmap -histo 94639 Capturing Heap Dump: To capture a heap dump using jmap, we need to use the dump option:\njmap -dump:[live],format=b,file= Along with that option, we should specify several parameters:\nlive: if set, it only prints objects which have active references and discards the ones that are ready to be garbage collected. This parameter is optional. format=b: specifies that the dump file will be in binary format. If not set, the result is the same. file: the file where the dump will be written to. pid: id of the Java process. Sample Heap Dump Usage: $ jmap -dump:live,format=b,file=/tmp/dump.hprof 45817 Understanding the Output: The heap dump file (dump.hprof in the above example) contains a snapshot of the Java heap at the time the dump was taken. Analyzing heap dumps can help diagnose memory leaks, understand memory usage patterns, and optimize garbage collection strategies. By utilizing the jmap tool to capture heap dumps, developers can gain valuable insights into the memory usage of Java applications and effectively troubleshoot memory-related issues.\nConclusion In this guide, we’ve covered four essential Java monitoring and diagnostic tools: jps, jstat, jcmd, and jmap. These tools are invaluable for understanding JVM behavior, diagnosing performance issues, and troubleshooting memory-related problems. By mastering these tools and understanding their output, developers can effectively monitor and optimize Java applications for optimal performance and reliability.\n","permalink":"https://learncodecamp.net/jps-jcmd-jstat-jmap/","summary":"\u003cfigure\u003e\u003cimg decoding=\"async\" src=\"https://images.unsplash.com/photo-1629654291663-b91ad427698f?q=80\u0026w=2874\u0026auto=format\u0026fit=crop\u0026ixlib=rb-4.0.3\u0026ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D\" alt=\"\" /\u003e\u003c/figure\u003e \n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#1-jps-java-process-status\"\u003ejps (Java Process Status)\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#usage\"\u003eUsage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#sample-usage\"\u003eSample Usage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#understanding-the-output\"\u003eUnderstanding the Output:\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#2-jstat-jvm-statistics-monitoring-tool\"\u003ejstat (JVM Statistics Monitoring Tool)\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#usage-1\"\u003eUsage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#sample-usage\"\u003eSample Usage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#understanding-the-output\"\u003eUnderstanding the Output:\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#jcmd-jvm-diagnostic-command\"\u003ejcmd (JVM Diagnostic Command)\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#usage-1\"\u003eUsage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#sample-usage\"\u003eSample Usage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#understanding-the-output\"\u003eUnderstanding the Output:\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#4-jmap-memory-map-for-java\"\u003ejmap (Memory Map for Java)\u003c/a\u003e\n\u003col\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#usage-1\"\u003eUsage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#sample-usage\"\u003eSample Usage:\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#understanding-the-output\"\u003eUnderstanding the Output:\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://learncodecamp.net/?p=466#conclusion\"\u003eConclusion\u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eJava applications often run in complex and dynamic environments, making it essential to monitor their performance and diagnose issues efficiently. Fortunately, the Java Development Kit (JDK) comes with a set of powerful tools for this purpose. In this guide, we will explore four essential tools: \u003ccode\u003ejps\u003c/code\u003e, \u003ccode\u003ejstat\u003c/code\u003e, \u003ccode\u003ejcmd\u003c/code\u003e, and \u003ccode\u003ejmap\u003c/code\u003e. We’ll discuss their functionalities, sample usage, and how to interpret their output effectively.\u003c/p\u003e","title":"A Comprehensive Guide to Java Monitoring and Diagnostics Tools: jps, jstat, jcmd, and jmap"},{"content":" Let’s Learn Javascript. Embark on an exciting journey into the world of web development as we unravel the mysteries of JavaScript, the backbone of interactive and dynamic web experiences. Let’s start with some questions\nWhere does JavaScript code run? Originally designed to run exclusively in browsers, JavaScript has undergone a transformation with the advent of Node.js. Created by Ryan Dahl in 2009, Node.js allows JavaScript code to run outside the browser environment. This means developers can use JavaScript to build the backend for web and mobile applications. Each browser has its JavaScript engine, like SpiderMonkey in Firefox and V8 in Chrome. Node.js incorporates Google’s V8 JavaScript engine into a C++ program, providing a runtime environment for executing JavaScript code outside the browser. In summary, JavaScript code can run both in a browser and in Node.js, broadening its application possibilities\nWhat is the difference between JavaScript and ECMAScript? ECMAScript (ES) is not a language itself; instead, it serves as a specification. JavaScript is a programming language that adheres to the ECMAScript specification. The ECMAScript specification is maintained by Ecma International, defining standards for JavaScript. The first version of ECMAScript was released in 1997, and annual releases have occurred since 2015. ECMAScript 2015, also known as ES6, brought numerous new features to JavaScript. It’s crucial to understand that JavaScript is the practical implementation of the ECMAScript specification, ensuring consistency and compatibility across different platforms.\nVariables In traditional JavaScript (pre-ES6), the var keyword was used to declare variables. However, with the introduction of ES6, the let keyword is now the recommended practice for variable declarations. For instance:\nlet name; Constants While variables allow us to store and manipulate data, there are instances in real-world applications where we want to ensure that the value remains constant throughout the program’s execution.\nLet’s consider an example where we declare a variable named interestRate and initially set it to 0.3:\nlet interestRate = 0.3;\ninterestRate = 1;\nconsole.log(interestRate); To declare a constant, you use the const keyword. Let’s modify our example accordingly:\nconst interestRate = 0.3;\ninterestRate = 1; TypeError: Assignment to constant variable.\nIn practice, the best practice is to default to using constants unless you explicitly need to reassign a variable. Constants provide stability to your codebase by ensuring that certain values remain constant throughout the execution of your program.\nVariables/Constants In JavaScript, the world of data is categorized into two main types: primitives, also known as value types, and reference types.\n1. Strings Strings represent sequences of characters and are declared using what we call a “string literal.” Here’s an example:\nlet name = 'Your String'; Strings are versatile and widely used for representing text and characters in JavaScript.\n2. Numbers Numbers are used to represent numerical values. They can be integers or floating-point numbers. For instance:\nlet age = 30; In this example, the variable age is assigned the number 30.\n3. Booleans Booleans are logical values that can be either true or false. They are particularly useful in situations where we need to make decisions based on the truth or falsity of a condition:\nlet isApproved = true; Here, isApproved is a boolean variable set to true.\n4. Undefined When a variable is declared but not initialized, its value is undefined. For instance:\nlet firstName; In this example, firstName is declared but not assigned a value, making its default state undefined.\n5. Null The null value is used in situations where we intentionally want to clear the value of a variable. It is commonly employed when dealing with user selections. For example:\nlet selectedColor = null; In this case, selectedColor is initially set to null. Later, if a user selects a color, we might reassign the variable to the chosen color. If the user wants to remove the selection, setting it back to null serves that purpose.\n6. Symbol It represents a unique identifier that can be used as an object property. Unlike strings or numbers, symbols are guaranteed to be unique, even if they have the same name.\nCreating Symbols: You can create a symbol using the Symbol() function, like this:\nconst mySymbol = Symbol();\nOptionally, you can pass a description as an argument to provide a human-readable description of the symbol:\nconst mySymbol = Symbol('mySymbolDescription'); Dynamic Typing in JavaScript One of the distinguishing features of JavaScript that sets it apart from many other programming languages is its dynamic nature. JavaScript is classified as a dynamic language, offering flexibility when it comes to variable types. In contrast to static languages where the type of a variable is fixed at declaration and cannot change, JavaScript allows the type of a variable to evolve during runtime.\nLet’s revisit the example of the name variable from the previous lecture. In JavaScript, we can use the typeof operator to inspect the type of a variable. Initially, name is declared as a string:\nlet name = 'Your String';\nconsole.log(typeof name); // Outputs: string However, in the dynamic world of JavaScript, the type of a variable can change. If we later reassign name to a number, the type dynamically adjusts:\nname = 42; console.log(typeof name); // Outputs: number This ability to adapt the type of a variable at runtime is a characteristic of dynamic languages like JavaScript.\nExploring typeof Operator: Let’s examine a few more examples using the typeof operator:\nlet age = 30; console.log(typeof age); // Outputs: number let isApproved = true; console.log(typeof isApproved); // Outputs: boolean let firstName; console.log(typeof firstName); // Outputs: undefined let selectedColor = null; console.log(typeof selectedColor); // Outputs: object An interesting observation is the type of firstName being reported as undefined. This seemingly curious behavior arises from the fact that undefined is considered both a type and a value in the primitive types category of JavaScript.\nObjects What is an Object? An object in JavaScript, much like objects in real life, consists of properties that define its characteristics. Think of a person with attributes like name, age, and address. In JavaScript, when dealing with multiple related variables, these variables can be encapsulated within an object.\nCreating an Object: Let’s dive into the syntax of creating an object using an object literal:\nlet person = {\nname: 'Marsh',\nage: 30\n}; In this example, person is an object with two properties: name and age. The keys (name and age) are the properties, and the corresponding values ('Marsh' and 30) are the data associated with those properties.\nAccessing Object Properties: There are two primary ways to access object properties:\nDot Notation\nconsole.log(person.name); // Outputs: Marsh Bracket Notation\nconsole.log(person['name']); // Outputs: Marsh Bracket notation is particularly useful when the property name is dynamic and determined at runtime. Modifying Object Properties: Properties of an object can be modified using either notation:\nperson.name = 'John'; console.log(person.name); // Outputs: John Dot Notation vs. Bracket Notation: While dot notation is concise and often preferred, bracket notation has its use cases, especially when property names are dynamic or not known in advance. For example:\nlet selection = 'name'; console.log(person[selection]); // Outputs: John Choosing Between Notations: Dot Notation: Cleaner and preferable for most cases.object.property Bracket Notation: Useful when dealing with dynamic property names.object['property'] Arrays in JavaScript Creating an Array:Let’s start by creating an array using the array literal, denoted by square brackets:\nlet selectedColors = []; // Empty Array In this example, selectedColors is initialized as an empty array.\nAdding Elements to the Array: You can add elements to an array using the array’s indices:\nselectedColors[0] = 'red';\nselectedColors[1] = 'blue'; Arrays in JavaScript are zero-indexed, meaning the first element is at index 0, the second at index 1, and so forth.\nDisplaying Array Elements: To display elements from an array, you can use the console.log statement along with the index:\nconsole.log(selectedColors[0]); // Outputs: red Dynamic Nature of Arrays: JavaScript’s dynamic nature extends to arrays as well. You can dynamically modify the length of an array and mix different data types within it:\nselectedColors[2] = 'green'; // Adding another color\nselectedColors[3] = 42; // Adding a number Now, selectedColors contains three strings and one number.\nArrays as Objects: Technically, arrays are objects in JavaScript. They inherit properties, and one such property is the length property, which gives the number of elements in the array:\nconsole.log(selectedColors.length); // Outputs: 4 Functions Functions in JavaScript are the bedrock of building modular and reusable code. They encapsulate a set of statements, enabling tasks to be performed or values to be calculated. Let’s unravel the essence of functions with a few practical examples.\nFunction Declaration: To declare a function, we use the function keyword, followed by the function name and parentheses:\nfunction greet(name) {\nconsole.log(\"Hello \" + name);\n} Here, greet is the function name, and it takes one parameter name. The logic of the function, enclosed in curly braces, is to log a greeting message to the console.\nCalling a Function: To invoke a function, we use its name followed by parentheses:\ngreet(\"John\");\ngreet(\"Mary\"); These function calls with different arguments result in personalized greetings on the console.\nFunction Parameters and Arguments: Understanding the difference between parameters and arguments is crucial. Parameters are the variables declared in the function, like name in our example. Arguments, on the other hand, are the actual values passed to those parameters during function calls. In our case, “John” and “Mary” are arguments.\nFunctions with Multiple Parameters: Functions can have multiple parameters, separated by commas:\nfunction greetWithLastName(firstName, lastName) {\nconsole.log(\"Hello \" + firstName + \" \" + lastName);\n} Here, greetWithLastName takes two parameters, firstName and lastName.\nDefault Values: If an argument isn’t provided for a parameter, JavaScript defaults it to undefined. To avoid this, you can assign default values:\nfunction greetWithDefaultLastName(firstName, lastName = \"Doe\") {\nconsole.log(\"Hello \" + firstName + \" \" + lastName);\n} Now, if no lastName is provided, it defaults to “Doe.”\nMultiple Function Calls: Functions can be called multiple times with different arguments:\ngreetWithLastName(\"John\", \"Smith\");\ngreetWithDefaultLastName(\"Alice\"); This results in varying personalized greetings based on the provided arguments.\n","permalink":"https://learncodecamp.net/learn-javascript/","summary":"\u003cfigure\u003e\u003cimg loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1920\" src=\"/wp-content/uploads/2024/02/image-scaled.jpeg\" alt=\"\" /\u003e\u003c/figure\u003e Let’s Learn Javascript. Embark on an exciting journey into the world of web development as we unravel the mysteries of JavaScript, the backbone of interactive and dynamic web experiences. \n\u003cp\u003eLet’s start with some questions\u003c/p\u003e\n\u003ch3 id=\"where-does-javascript-code-run\"\u003eWhere does JavaScript code run?\u003c/h3\u003e\n\u003cp\u003eOriginally designed to run exclusively in browsers, JavaScript has undergone a transformation with the advent of Node.js. Created by Ryan Dahl in 2009, Node.js allows JavaScript code to run outside the browser environment. This means developers can use JavaScript to build the backend for web and mobile applications. Each browser has its JavaScript engine, like SpiderMonkey in Firefox and V8 in Chrome. Node.js incorporates Google’s V8 JavaScript engine into a C++ program, providing a runtime environment for executing JavaScript code outside the browser. In summary, JavaScript code can run both in a browser and in Node.js, broadening its application possibilities\u003c/p\u003e","title":"Learning JavaScript"},{"content":"Introduction In distributed systems, maintaining data consistency across multiple systems is a challenging task. One common problem that arises is the dual write problem, where updates to multiple data stores must be performed atomically to prevent data inconsistencies. In this blog post, we’ll explore how to tackle the dual write problem using the outbox pattern in conjunction with Apache Kafka.\nUnderstanding the Dual Write Problem The dual write problem occurs when data needs to be written to multiple systems, and ensuring consistency between them becomes crucial. For instance, imagine an e-commerce application where an order is placed and data must be simultaneously written to a database and a messaging system like Kafka for further processing. If one write succeeds but the other fails, data inconsistency arises.\nIntroducing the Outbox Pattern The outbox pattern is a proven solution to the dual write problem. It involves persisting data that needs to be written to multiple systems in a local outbox table within the same transaction as the primary write operation. This decouples the primary write from the actual propagation, reducing the risk of inconsistencies. Once data is safely stored in the outbox, a separate process reads from it and propagates the changes to the target systems.\nImplementing the Outbox Pattern with Kafka: To integrate Kafka into the outbox pattern, we’ll use it as the messaging system for propagating data changes. Here’s how we can implement it:\nDesign the Outbox Table: Create an outbox table in your primary database to store events or data changes. Include fields like ID, timestamp, payload, and status. Write to the Outbox: Upon performing a write operation to the primary database, insert the relevant data into the outbox table within the same transaction. Process the Outbox Entries: Implement a separate process, often referred to as the outbox processor, to read from the outbox table periodically. This process reads pending entries and publishes them to a Kafka topic. Consume and Process Kafka Messages: Set up Kafka consumers to consume messages from the designated topic. These consumers process the messages and perform the necessary actions in the target system, ensuring data consistency. CREATE TABLE outbox (\nid INT AUTO_INCREMENT PRIMARY KEY,\ntimestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\npayload JSON NOT NULL,\nstatus ENUM('pending', 'processed') DEFAULT 'pending'\n);\nWrite to the Outbox: When you need to perform a dual write operation, insert the data into the outbox table within the same transaction as the primary write operation. This ensures atomicity.\nSTART TRANSACTION;\n-- Perform primary write operation\nINSERT INTO your_primary_table (column1, column2, ...) VALUES (value1, value2, ...);\n-- Insert into the outbox\nINSERT INTO outbox (payload) VALUES ('{\"id\": 123, \"data\": \"your_data\"}');\nCOMMIT; Process the Outbox Entries: Implement a separate process (outbox processor) that periodically reads from the outbox table and propagates the changes to the target systems. This process should be idempotent to handle failures and retries correctly.\nMark Entries as Processed: After successfully propagating the changes to all target systems, update the status of the outbox entries to ‘processed’ to prevent them from being processed again.\nUPDATE outbox SET status = 'processed' WHERE status = 'pending'; As the delivery process can try multiple times to push the message to Kafka, With this, we will have at least once delivery and we need idempotent processing logic on the consumer side. Idempotent processing ensures that the same message can be processed multiple times without causing unintended side effects.\nChallenges and Considerations: Serialization and Deserialization: Ensure proper serialization and deserialization of data between the outbox table and Kafka messages. Monitoring and Error Handling: Implement robust monitoring and error handling mechanisms to detect and handle failures in the propagation process. Message Ordering: Depending on the application requirements, you may need to consider message ordering guarantees provided by Kafka. ","permalink":"https://learncodecamp.net/dual-write-problem-outbox-pattern-kafka/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eIn distributed systems, maintaining data consistency across multiple systems is a challenging task. One common problem that arises is the dual write problem, where updates to multiple data stores must be performed atomically to prevent data inconsistencies. In this blog post, we’ll explore how to tackle the dual write problem using the outbox pattern in conjunction with Apache Kafka.\u003c/p\u003e\n\u003ch3 id=\"understanding-the-dual-write-problem\"\u003eUnderstanding the Dual Write Problem\u003c/h3\u003e\n\u003cp\u003eThe dual write problem occurs when data needs to be written to multiple systems, and ensuring consistency between them becomes crucial. For instance, imagine an e-commerce application where an order is placed and data must be simultaneously written to a database and a messaging system like Kafka for further processing. If one write succeeds but the other fails, data inconsistency arises.\u003c/p\u003e","title":"Solving the Dual Write Problem: Leveraging the Outbox Pattern with Kafka"},{"content":"JavaScript, being a single-threaded language, relies heavily on its event-driven nature and the event loop to handle asynchronous operations efficiently. Understanding the event loop is crucial for writing performant and responsive JavaScript applications.\nIntroduction \u0026lt;span style=\u0026ldquo;color: rgb(31, 31, 31); font-family: \u0026ldquo;Google Sans\u0026rdquo;, \u0026ldquo;Helvetica Neue\u0026rdquo;, sans-serif; font-size: 16px;\u0026quot;\u0026gt;The event loop is a fundamental concept in JavaScript’s concurrency model. It’s responsible for managing the execution of code, handling asynchronous operations, and ensuring responsiveness in web applications. At its core, the event loop continuously checks the call stack and the task queue, executing tasks in a non-blocking manner.\nThe Call Stack: A Pillar of Execution Central to JavaScript’s execution model is the call stack, a data structure that tracks the execution context of our code. As functions are invoked, they are pushed onto the call stack, and as they return, they are popped off the stack, maintaining the program’s flow of execution.\nWeb APIs: Beyond the Runtime But wait, there’s more to the JavaScript runtime than just the call stack. Web APIs provide additional functionalities provided by the browser environment. These APIs, ranging from DOM manipulation to asynchronous tasks like setTimeout, extend the capabilities of JavaScript beyond its core features.\nThe Event Loop and Callback Queue: Orchestrating Asynchrony At the heart of JavaScript’s concurrency model lies the event loop and the callback queue. While the call stack handles synchronous code execution, asynchronous tasks are managed through a process of event-driven coordination. The event loop monitors the call stack and callback queue, ensuring that asynchronous tasks are executed in a timely manner without blocking the main thread.\nHow Does the Event Loop Work? To understand the event loop, let’s break down its components and how they interact:\nCall Stack: The call stack is a data structure that keeps track of function calls in JavaScript. When a function is invoked, it’s added to the call stack, and when it completes, it’s removed from the stack. Task Queue: The task queue (also known as the callback queue) holds tasks that are ready to be executed. These tasks typically include asynchronous operations such as setTimeout callbacks, DOM events, and AJAX requests. Event Loop: The event loop continuously checks the call stack and the task queue. If the call stack is empty, it takes the first task from the queue and pushes it onto the call stack for execution. Here’s a simplified visualization of the event loop:\nwhile (true) {\nif (callStack.isEmpty()) {\nconst task = taskQueue.dequeue();\ncallStack.push(task);\n}\n} Asynchronous Operations and the Event Loop Asynchronous operations play a crucial role in JavaScript, allowing developers to perform tasks such as fetching data from servers, handling user input, and executing timers without blocking the main thread. Let’s explore some common asynchronous patterns and how they interact with the event loop:\nsetTimeout and setInterval: These functions schedule code to run after a specified delay or at regular intervals. When the timer expires, the callback function is added to the task queue and executed by the event loop. setTimeout(() =\u003e {\nconsole.log('Delayed task executed');\n}, 1000); Promises: Promises provide a way to work with asynchronous operations in a more elegant and sequential manner. When a promise resolves or rejects, its corresponding callbacks are added to the task queue for execution. const fetchData = () =\u003e {\nreturn new Promise((resolve, reject) =\u003e {\n// Asynchronous operation\nsetTimeout(() =\u003e {\nresolve('Data fetched successfully');\n}, 2000);\n});\n};\nfetchData().then((data) =\u003e {\nconsole.log(data);\n}); Event Handlers: DOM events such as click, mouseover, and keydown trigger event handlers asynchronously. When an event occurs, its callback function is added to the task queue and executed by the event loop. document.getElementById('myButton').addEventListener('click', () =\u003e {\nconsole.log('Button clicked');\n}); Best Practices and Considerations Understanding the event loop is essential for writing efficient and responsive JavaScript code. Here are some best practices and considerations to keep in mind:\nAvoid Blocking the Event Loop: Long-running synchronous operations can block the event loop, leading to poor performance and unresponsiveness. Whenever possible, use asynchronous patterns such as Promises and async/await to offload heavy computations and I/O operations. Use setTimeout with Caution: While setTimeout is useful for scheduling tasks, it’s important to be mindful of its behavior, especially in scenarios where precise timing is required. Due to the event loop’s non-deterministic nature, setTimeout callbacks may not always execute at exactly the specified time. Optimize DOM Operations: Manipulating the DOM can be expensive, especially when performed frequently or in large batches. Minimize DOM manipulation and use techniques like batching, virtual DOM, and CSS optimizations to improve performance. Handle Errors Gracefully: Asynchronous operations can fail for various reasons, such as network errors, server downtime, or invalid input. Always handle promise rejections and errors in event handlers to prevent application crashes and provide a better user experience. Conclusion In conclusion, the event loop is a foundational concept in JavaScript’s concurrency model, enabling asynchronous programming and non-blocking I/O. By understanding how the event loop works and how it interacts with asynchronous operations, developers can write more responsive and efficient JavaScript applications. By following best practices and considering the nuances of asynchronous programming, you can build robust and performant web applications that provide a seamless user experience.\n","permalink":"https://learncodecamp.net/event-loop-in-javascript/","summary":"\u003cp\u003eJavaScript, being a single-threaded language, relies heavily on its event-driven nature and the event loop to handle asynchronous operations efficiently. Understanding the event loop is crucial for writing performant and responsive JavaScript applications.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003e\u0026lt;span style=\u0026ldquo;color: rgb(31, 31, 31); font-family: \u0026ldquo;Google Sans\u0026rdquo;, \u0026ldquo;Helvetica Neue\u0026rdquo;, sans-serif; font-size: 16px;\u0026quot;\u0026gt;The event loop is a fundamental concept in JavaScript’s concurrency model. It’s responsible for managing the execution of code, handling asynchronous operations, and ensuring responsiveness in web applications. At its core, the event loop continuously checks the call stack and the task queue, executing tasks in a non-blocking manner.\u003c/span\u003e\u003c/p\u003e","title":"Understanding the Event Loop in JavaScript: A Comprehensive Guide"},{"content":"Contact Us We’d love to hear from you! Reach out to us with any questions, feedback, or inquiries using the email contact@learncodecamp.net\n","permalink":"https://learncodecamp.net/contact-us/","summary":"\u003ch2 id=\"contact-us\"\u003eContact Us\u003c/h2\u003e\n\u003cp\u003eWe’d love to hear from you! Reach out to us with any questions, feedback, or inquiries using the email \u003ca href=\"mailto:contact@learncodecamp.net\"\u003econtact@learncodecamp.net\u003c/a\u003e\u003c/p\u003e","title":"Contact Us"},{"content":"Equality Comparisons in JavaScriptEquality comparisons are fundamental in JavaScript programming, allowing developers to evaluate conditions and compare values. However, navigating the nuances of JavaScript’s equality operators—==, ===, and Object.is()—can sometimes be tricky. In this blog post, we’ll delve into each of these operators, discussing their differences, best practices, and common pitfalls.\n== (Loose Equality Operator) The == operator, also known as the loose equality operator, compares two values after performing type coercion. This means that if the operands have different types, JavaScript will attempt to convert them to a common type before making the comparison. While this automatic type conversion can be convenient, it can also lead to unexpected behavior and subtle bugs.\nExample:\nconsole.log(1 == '1'); // true === (Strict Equality Operator) In contrast to ==, the === operator, known as the strict equality operator, checks for equality of values without performing type coercion. It compares both the values and the types of the operands. Using === is generally considered a best practice because it provides predictable behavior and helps prevent unintended type conversions.\nExample:\nconsole.log(1 === '1'); // false Object.is() Introduced in ECMAScript 2015 (ES6), the Object.is() method provides a way to perform strict equality comparisons similar to ===. However, Object.is() differs from === in how it handles special cases such as NaN and -0 (negative zero). It is useful when you need to differentiate between these special cases.\nExample:\nconsole.log(Object.is(NaN, NaN)); // true\nconsole.log(Object.is(-0, 0)); // false Best Practices Use == with caution due to its implicit type coercion. Prefer === for most equality comparisons to avoid unexpected behavior. Use Object.is() when you need to differentiate between special cases like NaN and -0. In summary, understanding the differences between ==, ===, and Object.is() is essential for writing robust and predictable JavaScript code. By following best practices and choosing the appropriate operator for each situation, you can avoid common pitfalls and ensure the reliability of your code.\n","permalink":"https://learncodecamp.net/understanding-equality-comparisons-in-javascript/","summary":"\u003cp\u003eEquality Comparisons in JavaScriptEquality comparisons are fundamental in JavaScript programming, allowing developers to evaluate conditions and compare values. However, navigating the nuances of JavaScript’s equality operators—\u003ccode\u003e==\u003c/code\u003e, \u003ccode\u003e===\u003c/code\u003e, and \u003ccode\u003eObject.is()\u003c/code\u003e—can sometimes be tricky. In this blog post, we’ll delve into each of these operators, discussing their differences, best practices, and common pitfalls.\u003c/p\u003e\n\u003ch3 id=\"-loose-equality-operator\"\u003e\u003ccode\u003e==\u003c/code\u003e (Loose Equality Operator)\u003c/h3\u003e\n\u003cp\u003eThe \u003ccode\u003e==\u003c/code\u003e operator, also known as the loose equality operator, compares two values after performing type coercion. This means that if the operands have different types, JavaScript will attempt to convert them to a common type before making the comparison. While this automatic type conversion can be convenient, it can also lead to unexpected behavior and subtle bugs.\u003c/p\u003e","title":"Understanding Equality Comparisons in JavaScript"},{"content":"Privacy Policy Your privacy is important to us at Learn Code Camp. This Privacy Policy outlines how we collect, use, and protect your personal information when you visit our website.\nInformation We Collect When you visit Learn Code Camp, we may collect certain information about your visit, including your IP address, browser type, and the pages you view. We use this information to analyze trends, administer the site, and improve our services.\nUse of Cookies Learn Code Camp uses cookies to personalize content and ads, provide social media features, and analyze our traffic. By using our website, you consent to the use of cookies in accordance with our Privacy Policy.\nThird-Party Links Our website may contain links to third-party websites or services that are not owned or controlled by Learn Code Camp. We are not responsible for the privacy practices or content of these sites. We encourage you to review the privacy policies of any third-party sites you visit.\nChanges to This Privacy Policy We reserve the right to update or change our Privacy Policy at any time. Any changes will be posted on this page, and the revised date will be indicated at the top of the page. We encourage you to review this Privacy Policy periodically for any updates or changes.\nContact Us If you have any questions or concerns about our Privacy Policy, please contact us at contact@learncodecamp.net.\n","permalink":"https://learncodecamp.net/privacy-policy/","summary":"\u003ch2 id=\"privacy-policy\"\u003ePrivacy Policy\u003c/h2\u003e\n\u003cp\u003eYour privacy is important to us at Learn Code Camp. This Privacy Policy outlines how we collect, use, and protect your personal information when you visit our website.\u003c/p\u003e\n\u003ch3 id=\"information-we-collect\"\u003eInformation We Collect\u003c/h3\u003e\n\u003cp\u003eWhen you visit Learn Code Camp, we may collect certain information about your visit, including your IP address, browser type, and the pages you view. We use this information to analyze trends, administer the site, and improve our services.\u003c/p\u003e","title":"Privacy Policy"},{"content":"About Us Welcome to Learn Code Camp! Learn Code Camp is dedicated to providing high-quality coding education to enthusiasts of all skill levels. Whether you’re a complete beginner or an experienced developer looking to expand your knowledge, we offer a variety of courses and resources to help you succeed.\nThis platform is created and maintained by Nitin Kalra — a software engineer with over a decade of industry experience, focused on building scalable systems and sharing practical, real-world engineering knowledge.\n💼 LinkedIn: https://www.linkedin.com/in/nitin-kalra-b5480947/ 🐦 Twitter/X: Tweets by nkalra0123 💻 GitHub: https://github.com/nkalra0123 Our mission is to make learning to code accessible, engaging, and practical. We believe that everyone should have the opportunity to learn valuable technical skills, regardless of their background or experience.\nAt Learn Code Camp, we prioritize hands-on learning and real-world problem solving. The blogs and tutorials are designed to be practical, experience-driven, and continuously updated to reflect modern development practices.\nJoin us on your coding journey and unlock your potential!\n","permalink":"https://learncodecamp.net/about-us/","summary":"\u003ch2 id=\"about-us\"\u003eAbout Us\u003c/h2\u003e\n\u003ch3 id=\"welcome-to-learn-code-camp\"\u003eWelcome to Learn Code Camp!\u003c/h3\u003e\n\u003cp\u003eLearn Code Camp is dedicated to providing high-quality coding education to enthusiasts of all skill levels. Whether you’re a complete beginner or an experienced developer looking to expand your knowledge, we offer a variety of courses and resources to help you succeed.\u003c/p\u003e\n\u003cp\u003eThis platform is created and maintained by \u003cstrong\u003eNitin Kalra\u003c/strong\u003e — a software engineer with over a decade of industry experience, focused on building scalable systems and sharing practical, real-world engineering knowledge.\u003c/p\u003e","title":"About Us"},{"content":"Introduction Java developers often need to execute programs as part of their development process, whether they are testing, building, or deploying applications. Maven, a popular build automation tool primarily used for Java projects, provides a convenient way to manage dependencies, build, and run Java programs. In this guide, we will explore various methods and best practices for running programs in Java using Maven.\nSetting Up Maven Before diving into running programs with Maven, it’s essential to ensure that Maven is properly installed on your system. Maven can be downloaded and installed from the official Apache Maven website (https://maven.apache.org/download.cgi). Once installed, make sure to set up the PATH environment variable to include Maven’s bin directory, allowing you to run Maven commands from any location in your terminal or command prompt.\nCreating a Maven Project To create a new Maven project, you can use the mvn command-line tool or an integrated development environment (IDE) such as IntelliJ IDEA or Eclipse. Using the mvn command, navigate to the directory where you want to create the project and execute the following command:\nmvn archetype:generate -DgroupId=com.example -DartifactId=my-project -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false This command creates a new Maven project with the specified group ID (com.example), artifact ID (my-project), and uses the maven-archetype-quickstart archetype, which provides a basic project structure.\nRunning Programs with Maven Once you have a Maven project set up, you can run Java programs using various plugins and configurations in the pom.xml file, Maven’s project object model.\nExec Plugin: The Exec Maven Plugin allows you to execute Java programs and external commands as part of the build process. To use the Exec Plugin, add the following configuration to your pom.xml file:\norg.codehaus.mojo\nexec-maven-plugin\n3.0.0\njava\ncom.example.Main\nReplace com.example.Main with the fully qualified name of your main class. You can then execute your program using the following Maven command:\nmvn exec:java This command will compile your project and execute the main class specified in the configuration.\nMaven Compiler Plugin: Another approach to running Java programs with Maven is by using the Maven Compiler Plugin to compile the code and then execute it manually. Add the following configuration to your pom.xml file:\norg.apache.maven.plugins\nmaven-compiler-plugin\n3.8.1\n1.8\n1.8\nReplace the \u0026lt;source\u0026gt; and \u0026lt;target\u0026gt; versions with your desired Java version. Then, compile your code using the following command:\nmvn compile Once the compilation is successful, you can run your program using the java command:\njava -cp target/classes com.example.Main Best Practices and Considerations: Maintain a clean project structure: Organize your project according to Maven’s conventions to improve readability and maintainability. Utilize Maven profiles: Define profiles in your pom.xml to manage different build configurations, such as development, testing, and production. Automate testing: Integrate testing frameworks like JUnit with Maven to automate the execution of unit tests as part of your build process. Dependency management: Leverage Maven’s dependency management capabilities to efficiently manage project dependencies and ensure consistency across environments. Continuous integration (CI): Integrate your Maven projects with CI/CD pipelines (e.g., Jenkins, Travis CI) for automated builds, testing, and deployment. Conclusion Running Java programs with Maven offers a streamlined approach to managing dependencies, building, and executing code. By leveraging Maven plugins and configurations, developers can efficiently run Java programs within their projects, enhancing productivity and maintainability. Incorporating best practices such as maintaining a clean project structure and automating testing further improves the development workflow. As you continue to explore Maven, experiment with different plugins and configurations to optimize your development process further.\n","permalink":"https://learncodecamp.net/comprehensive-guide-to-running-programs-in-java-with-maven/","summary":"\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eJava developers often need to execute programs as part of their development process, whether they are testing, building, or deploying applications. Maven, a popular build automation tool primarily used for Java projects, provides a convenient way to manage dependencies, build, and run Java programs. In this guide, we will explore various methods and best practices for running programs in Java using Maven.\u003c/p\u003e\n\u003ch3 id=\"setting-up-maven\"\u003eSetting Up Maven\u003c/h3\u003e\n\u003cp\u003eBefore diving into running programs with Maven, it’s essential to ensure that Maven is properly installed on your system. Maven can be downloaded and installed from the official Apache Maven website (\u003ca href=\"https://maven.apache.org/download.cgi)\"\u003ehttps://maven.apache.org/download.cgi)\u003c/a\u003e. Once installed, make sure to set up the \u003ccode\u003ePATH\u003c/code\u003e environment variable to include Maven’s \u003ccode\u003ebin\u003c/code\u003e directory, allowing you to run Maven commands from any location in your terminal or command prompt.\u003c/p\u003e","title":"A Comprehensive Guide to Running Programs in Java with Maven"}]