Alea
Back to Podcast Digest
Alex Finn··21m

How to use OpenClaw for free (local AI models guide)

TL;DR

  • Alex’s core bet is that local AI becomes the default in the next 12 months — after spending over $50,000 testing gear like maxed-out Mac Studios and Nvidia DGX Spark, he argues the real unlock is not cheaper inference but running agents 24/7 for effectively just electricity.

  • He frames cloud models as the “brain” and local models as the “muscles” — his current setup uses ChatGPT as the orchestrator while local models like Qwen 3.5 handle coding, research, and scraping inside OpenClaw, which cuts token costs while keeping throughput high.

  • You do not need expensive hardware to start — Alex says even an old laptop or a $600 Mac Mini can run useful local models for tasks like memory management, lightweight writing, or delegated agent work, even if they’re not frontier-level.

  • The three local model families he recommends are Qwen 3.5, Nvidia’s Neotron 3, and MiniMax 2.5 — Qwen 3.5 is his daily driver, Neotron 3 is the new open-source model he’s most excited about, and MiniMax 2.5 is the speed pick for fast lightweight tasks.

  • His concrete use cases are already agentic, not theoretical — he shows a local coding agent named Charlie writing software continuously and another MiniMax-powered agent scraping the web 24/7 for business opportunities that get piped into Telegram.

  • His advice is to scale hardware only after proving one small use case — instead of buying $20,000-$40,000 of machines upfront, he recommends starting with whatever machine you have, offloading one workflow, then upgrading to hardware like a Mac Studio or $4,800 DGX Spark once you see ROI.

The Breakdown

Why cloud AI is expensive, unprivate, and hard to scale

Alex opens hard: he’s spent more than $50,000 testing local AI over two months, and he thinks most people are still stuck in the old model — paying cloud providers every time a prompt hits a remote server. His pitch is simple and blunt: cloud AI means token bills, $200/month subscriptions, lag, no real privacy, and no control when companies quietly “dial the knobs” and make models feel dumber.

The local AI pitch: free, private, offline, and always-on

He contrasts that with local models running on your own machine — a Mac Mini, Mac Studio, or even an “old dusty Lenovo laptop.” The vibe here is freedom: no internet required, no one reading your prompts, full customization, and the ability to run AI nonstop like a tireless employee instead of something you meter by the token.

The tradeoff: local models are a little behind, but still very strong

Alex is clear that local models are not quite frontier level yet; he estimates they’re about six months behind. But he makes that gap feel smaller by comparing them to the era when Opus 4.5-level capability still felt mind-blowing, arguing that local intelligence is already strong enough for serious work.

What hardware actually makes sense, from Mac Mini to DGX Station

He spends time demystifying the hardware ladder. A $600 Mac Mini can still run useful models and small workflows, while Mac Studios and Nvidia’s $4,800 DGX Spark get you into much more capable territory; at the extreme, Nvidia’s newly announced $100,000 DGX Station looks to him like “an entire AI research lab on your desk.” He also gives the practical tradeoff: Apple’s unified memory lets you load bigger models, while Nvidia’s VRAM and tooling are better for speed, fine-tuning, LoRAs, and auto-research.

The three models he actually uses: Qwen, Neotron, and MiniMax

When he gets specific, Alex narrows it to three local model families: Qwen 3.5, Neotron 3, and MiniMax 2.5. Qwen 3.5 is his daily driver because it scales from a 9B model on a 16GB Mac Mini up to much larger variants; Neotron 3 is the new Nvidia open-source entrant he’s especially excited about; MiniMax 2.5 is his pick when he wants lightweight, very fast execution.

Hugging Face and LM Studio: the easiest way to get started

He points people to Hugging Face as the main source for open models and suggests a very Alex-style shortcut: just ask OpenClaw which model best fits your hardware. Then he demos LM Studio, a free app for loading local models, showing MiniMax 2.5 on his Mac Studio writing a snake game in about 15 seconds while he pushes back on people who say Apple silicon is too slow.

Inside his “software factory” running on OpenClaw

The most vivid part of the video is his actual setup. He shows a coding agent named Charlie, powered locally by Qwen 3.5, writing software 24/7, while a manager agent named Ralph, powered by ChatGPT, orchestrates the whole workflow — his “brain and muscles” pattern in action. He also shows a MiniMax-based opportunity scanner scraping the internet nonstop and sending business ideas straight into Telegram, which he frames as the kind of always-on work cloud APIs make too expensive.

His real advice: start small, prove ROI, then scale up

Alex closes by warning people not to copy his hardware spend just because he has three maxed-out Mac Studios. His recommendation is to start with the machine you already own, hand off one narrow task to a local model, learn the stack, and only then level up to something like a DGX Spark once you know what workload justifies it. He’s also bullish on Apple’s next jump, predicting an M5 Ultra Mac Studio will be “incredible,” while joking that he’ll probably still buy the $100,000 DGX Station because he’s “about that life.”