TTFT in LLMs Explained: What Time to First Token Really Measures
When I evaluate an LLM system, one of the first latency metrics I look at is TTFT, or time to first token. This metric answers a simple question: After a user sends a request, how long does it take before the first output token appears? That sounds narrow, but it matters a lot. Users usually forgive a response that streams steadily after it starts. What feels bad is the dead time before anything appears on screen. ...