Alea
Back to Podcast Digest
AI News & Strategy Daily | Nate B Jones··22m

Wall Street Just Bet $285 Billion on AI Agents. The Best One Barely Works.

TL;DR

  • Wall Street priced in an agent revolution before the tech is ready — Nate says Anthropic’s Computer Use helped trigger a $285 billion SaaS sell-off even though the flagship agent is still in research preview and does absurdly fragile things like stop working when you close your laptop.

  • The real test for outcome agents is not the demo — it’s memory, editable artifacts, and compounding context — his rubric is three questions: does the agent remember across sessions, does it produce work you can inspect and edit, and does the system actually get smarter by accumulating context over time.

  • Code agents worked first because code is verifiable — tools like Claude Code and Codex had an easier path because software can be judged by a simple question, “Does it run?”, while post-code knowledge work is much messier to prove right or wrong.

  • Lindy looks like the clearest startup wedge, but not yet a true deep outcome agent — Nate sees a real niche between Zapier and Computer Use for busy executives, but his own experience plus Lindy’s roughly 2.4/5 Trustpilot score point to opaque outputs, credit burn, and weak debugging.

  • Sauna, not Lindy, best captures where the category wants to go — after Wordware raised $30 million from Spark Capital and YC for an agent IDE, it pivoted to “an AI workspace for professionals” built around memory as infrastructure, though Nate says it’s still too early and too demo-heavy to trust in production.

  • Google Opal is the easiest agent tool to try, but its biggest strength and weakness is that it’s Google — it’s free, remixable, and people are sharing real builds like meeting-prep agents, yet Nate worries Google may strand it as another experiment and that its spreadsheet-like memory won’t support serious long-running work.

The Breakdown

The hype machine: “have the coffee” while agents do the work

Nate opens on the now-familiar promise from tools like Computer Use, Codex, Lindy, Sauna, and Google Opal: you state the outcome, walk away, and the agent handles the job. His point is that the category is exploding in visibility, but most of these tools still dodge the genuinely hard part of agent work.

How Anthropic kicked off a $285 billion panic

He rewinds to January, when Anthropic launched Computer Use: an agent that works on your actual computer, with your files, opening apps and navigating the browser with no code required. Microsoft quickly responded with Copilot “co-work,” and as Anthropic rolled out finance, legal, and medical skills, Wall Street started imagining AI replacing expensive SaaS seats — helping drive more than $285 billion off SaaS valuations. The punchline: the tool that spooked the market still falls asleep if you shut your laptop.

Why code got agents first

Nate says code succeeded first because it’s a “verifiable domain.” You can evaluate an agent’s output with one brutal, clean test — does it run? That’s why Claude Code, Codex, and Google’s early agent workflows all made sense before broader knowledge-work agents: non-code work is much harder to judge, so the bar for dependable outcomes gets fuzzier fast.

The three-question rubric that cuts through the demos

His practical framework is simple: does the agent have persistent memory, does it produce editable artifacts, and does context compound over time? Even Computer Use, the product that set off all the excitement, only scores something like “one and a half out of three” in his view — decent artifacts, partial memory, basically no compounding context. That gap between market excitement and product reality is, for him, the whole story.

Lindy’s executive-friendly pitch — and the trade-off underneath

Lindy is the best-known startup in the category, pitched by founder Flo Crivello as a natural-language way for busy executives to describe an outcome and let the system build and run the workflow. Nate thinks that niche is real, but his own experience was mixed: text-based adjustments didn’t reliably stick, complex tasks burned credits, and users on Trustpilot rate it around 2.4 out of 5. His takeaway is that Lindy optimized for an easy front door, but the hidden cost is weak debugging and outputs that are harder to inspect and edit.

Sauna’s pivot may be the most important story here

Wordware originally raised $30 million from Spark Capital and YC to build an IDE for agent development, then pivoted when the team realized people don’t wake up wanting to “build an automation” — they wake up with too much to do. The new product, Sauna, is framed as an AI workspace for professionals, with founder Filip Kozera pushing memory as a substrate, not a feature. Nate likes the core insight that knowledge workers won’t become programmers; they’ll need to write clearer specs — but he also thinks Sauna is still early, flashy, and not yet proven beyond demos.

Google Opal: zero-friction experimentation, classic Google uncertainty

Opal gets more credit from Nate than most roundups give it: it’s free, powered by Gemini 3 Flash, supports workflow remixing, and people on X are sharing actual builds instead of just polished videos. That said, he worries the memory layer is too spreadsheet-like for serious long-running agent work, and the bigger risk is familiar: Google has a habit of shipping promising experiments and then abandoning them “out on an iceberg.” Good for lightweight workflows, maybe not for the heavy-duty future Wall Street is betting on.

Obvious, OpenBrain, and the architecture Nate thinks matters

He closes with Obvious, a quiet but ambitious AI workspace with SQL workbooks, live-chart docs, presentations, custom apps, Kanban boards, and cross-artifact relationships. More important than any one tool, though, is his three-layer architecture: a knowledge store for memory, agent recipes as reusable workflows — like old punch cards — and a scheduling loop so the system improves over time. That’s also the logic behind his own OpenBrain project: not just cheaper agents, but infrastructure you control.