Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit.
TL;DR
The real cost problem isn’t frontier models — it’s sloppy token habits — Nate argues Claude, ChatGPT, and Gemini can stay surprisingly cheap if you stop wasting context on raw PDFs, sprawling chats, unnecessary plugins, and overusing top-tier models.
Raw document ingestion is a rookie mistake that can explode costs by 20x — his example turns 4,500 words across three PDFs into 100,000+ tokens because of PDF formatting overhead, while converting to markdown first drops that to roughly 4,000–6,000 tokens.
Long-running chats quietly torch your limit because every turn resends the whole conversation — Nate says users should split “gather information” mode from “do the work” mode, refresh chats every 10–15 turns, and stop treating one thread like a forever workspace.
Plugins and connectors can tax you before you type a single word — he cites one user starting 50,000 tokens deep due to loaded tools, comparing it to pulling all 200 tools off a workshop wall when you only need five to build a bench.
A clean workflow can cut costs 8–10x without changing the output quality — his side-by-side example drops a 5-hour Opus-heavy session from $8–$10 to about $1 by using markdown, fresh chats, scoped context, Opus for reasoning, Sonnet for execution, and Haiku for polish.
This matters more as next-gen models like rumored Claude Mythos get pricier — with Jensen Huang floating $250,000 per engineer per year in token spend and Nate expecting higher pricing on GB300-trained models, prompt caching, retrieval, and context discipline become job skills, not optimization trivia.
The Breakdown
Why the next wave of models will punish bad habits
Nate opens with a blunt warning: the next generation — Claude Mythos, whatever ChatGPT drops next, the next Gemini — is likely coming in 1–2 months, and it will cost more because it’s trained on pricier Nvidia GB300 chips. His point isn’t “don’t use frontier models”; it’s that cheap ambient AI will be the dumb stuff, and if you want the best intelligence, you need to stop wasting tokens.
The rookie error: stuffing raw PDFs into Claude
His first big offender is document ingestion. A user drags in three PDFs totaling just 4,500 words and asks for a summary, but the model doesn’t just read the words — it eats headers, footers, fonts, layout metadata, and binary junk, turning that into 100,000+ tokens. Nate practically begs people to convert to markdown first, saying this one step can save 20x and stop bloated files from haunting every subsequent turn.
Conversation sprawl is where limits go to die
Then he goes after the “20, 30, 40 turn” chat habit. Nate says people mix brainstorming, research, and execution in one endless thread, even though models were never really meant for that kind of sprawl, and every reply keeps replaying the entire history. His fix is simple but sharp: have one mode for gathering information and another for focused execution, and start fresh once you’ve reached a conclusion.
Your plugin stack might be burning tokens in the background
For intermediate users, the trap is tool overload. Nate says some people are 50,000 tokens into a context window before typing anything because they’ve loaded too many plugins and connectors, which he compares to dragging every tool in a workshop onto the bench before building anything. The message is not “never use tools” — it’s audit them, because each shiny add-on becomes a silent tax forever.
Advanced users make the biggest mistakes at the biggest scale
For developers running agents and API workflows, Nate gets harsher: if you haven’t pruned your system prompt lately or you’re still loading giant repos into context because it worked two model generations ago, that’s irresponsible. As models get smarter, he says, you should be shrinking context, trusting retrieval more, and treating token management as a real job skill because these are million-token decisions when scaled.
The math is ugly: same output, 8–10x cheaper
He gives a concrete before-and-after. A sloppy 5-hour session using raw PDFs, 30-turn sprawl, and Opus 4.6 for everything can hit 800,000 to 1 million input tokens plus 150,000–200,000 output tokens, costing roughly $8–$10. Clean it up with markdown, fresh chats every 10–15 turns, scoped context, Opus for reasoning, Sonnet for execution, and Haiku for polish, and the same work drops to about $1 — which compounds from a personal annoyance into a team-level budget issue fast.
Nate’s “stupid button” and the six things it checks
The back half of the video is his practical answer: a “stupid button” built to diagnose token waste. It checks things like raw PDFs and screenshots, stale conversations, overuse of premium models, hidden context from plugins, missing prompt caching, and expensive native web search; he highlights prompt caching’s 90% discount and says Perplexity often uses 10,000–50,000 fewer tokens per search than Claude while being about 5x faster.
Five commandments for agents — and a cultural rant at the end
He closes by translating all this into agent design: index references, preprocess everything, cache stable context, scope each agent to only what it needs, and measure per-call burn. Then he zooms out and says token burning has weirdly become a status symbol, when the real goal is “smart tokens,” not just more tokens — a direct response to Jensen Huang’s $250,000-per-developer figure. His last note is classic Nate: touch grass, stop wasting tokens on dumb stuff, and spend them on bold, creative work instead.