A Deep Dive into AI with LogosCo | Local AI & Qwen3.5-27B
TL;DR
0xSero’s whole AI thesis is autonomy, not novelty — he got pulled into local LLMs because they let him “do more stuff without needing other people,” from booking a haircut to organizing government documents to running parallel coding agents all day.
He is stress-testing the economics harder than almost anyone — after once tweeting about 500 million tokens in a week, he says he’s now at roughly 18–20 billion tokens in a month on Codex and hit 3.6 billion tokens in a single day, which he argues makes today’s $200 subscriptions obviously unsustainable.
His workflow is basically human-as-orchestrator — instead of one assistant, he runs multiple panes of Codex, Claude, GPT and Chinese models in parallel, often assigning the same task to 3 agents and then cherry-picking the best output before doing a validation pass to catch fabricated results.
Local models have crossed the usefulness threshold — Sero argues that models like Qwen 3.5 27B already fit comfortably on a 32 GB MacBook, are strong at tool use, bash, coding, and vision, and are now “good enough for pretty much anything” most people actually need.
The real reason to learn local AI is resilience — his core prediction is that cloud AI will get more expensive, more rationed, and more centralized as corporations and governments buy up access, so running even a smaller local model is like “learning how to start a fire” before you actually need it.
On privacy, he goes full fatalist — after watching agents delete drives and seeing how easily Grok can infer intimate details from public traces, he says people should assume they are already fully profiled and optimize around capability rather than pretending privacy can still be fully preserved.
The Breakdown
The origin story: AI as a way to need fewer people
0xSero says his attraction to AI is the same one that pulled him into crypto: anything that reduces dependency and lets you act directly. His earliest spark was talking to brittle chatbot sites at 13, then DALL·E in 2022, when he immediately saw a world where people could sell services they didn’t actually know how to perform because generation had become nearly free.
The recent turning point was agents doing boring real work
The big shift came when he got early access to Claude’s Chrome extension and started using it daily for long-running, concrete tasks like scraping images, reorganizing email, and handling repetitive digital chores. His frustration was that the people around him still treated AI like mystical sci-fi, while he saw it as a very practical machine for demolishing admin work.
What his desktop actually looks like: chaos, panes, and parallel jobs
Sero’s setup is constant parallelism: three windows, multiple panes, tasks running everywhere, with Codex, GPT, Claude, Chinese models, Figma automation, and Zed all in play. His point is simple: if one agent runs for 20–60 minutes, the bottleneck stops being model speed and becomes your own ability to context-switch without forgetting what each thread is doing.
The trick is not more agents — it’s how you split the work
He says this breaks down fast on messy existing codebases, but works well on greenfield projects or clearly separable features. One of his favorite tactics is spawning multiple agents on the same task, then distilling their outputs into a cleaner “master” version; if a model adds 3,000 lines, he’ll literally tell it to cut the code in half and let it grind for another hour.
Agents are already personal operators, not just coding copilots
The conversation widens beyond software: both hosts describe agents planning travel, comparing hotels, messaging sellers, and even coordinating a RAM purchase via SMS. Sero’s framing lands especially well here: the valuable shift is not “chatting with an LLM,” but “having conversations with your data” — whether that’s Twitter exports, bills, documents, or your life admin.
Why local matters now: Qwen, Macs, and the coming token squeeze
Sero’s strongest claim is economic: if his usage really implies the equivalent of 16 high-end GPUs running 24/7, then current flat-rate subscriptions cannot last. That’s why he wants people learning local inference now, especially with Qwen 3.5 27B, which he calls the current gold standard for local agents because it fits on mainstream hardware, has vision and strong tool calling, and feels like a meaningful step toward sovereignty.
His hardware worldview: stack compute like Bitcoin
He shows off an 8x RTX 3090 rig, jokes about the room drying out to 9% humidity, and says he’ll keep acquiring machines because compute has obvious strategic value. But he also emphasizes the practical middle path: Apple hardware, quantization, offloading, Strix Halo-class boxes, and even Raspberry Pi-class setups can still be useful depending on whether you need agents, tagging, or lightweight local inference.
Privacy, supply-chain attacks, and the bleak realism at the end
When the talk turns to package compromise and privacy, Sero takes the hardest line of the entire interview: he no longer thinks AI can be used safely in any clean, absolute sense. His advice is almost jarring — assume the leak has already happened, assume the profiling already exists, and recognize that what changed is not data collection but the arrival of cheap enough intelligence to process all of it at scale.