ollama — Hey, it's Bernie!

Tiering my LLMs so I stop burning money thinking out loud

May 1, 2026

I had an uncomfortable moment recently. I looked at how I was using AI and realised I was doing what I used to hate about email, treating every situation the same way, defaulting to the same tool, and wondering why things felt inefficient. I was using Claude for everything. Quick lookups. Document summaries. Tasks I already knew the answer to. It's like hiring a senior consultant to photocopy things. So I built a simple tiering system. Three models. Each one has a specific job. And my token spend, and more importantly, my thinking, got a lot cleaner.

Tier 1: Claude for thinking This is where the real work happens. When I wanted to build a personal finance system, I didn't ask Claude to build it. I asked Claude to help me think through what I actually needed, the folder structure, the monthly operating skills, a sensible way to keep my balance sheet updated. Claude helped me design the blueprint. That blueprint becomes the instruction set for everything downstream. So this is where I invest time getting it right.

Tier 2: Gemini for research and comparison Once I have a direction, I don't need deep reasoning anymore. I need fast, reliable information. Gemini handles my desk research. Comparing tools. Pulling facts. Understanding what something actually does before I commit to it. Its context window is large, it's cheaper for this kind of work, and frankly it's better suited to it than Claude is. Right tool, right job.

Tier 3: Gemma running locally, for free This is the part I'm most excited about, and the part most people skip entirely. I run Gemma 4 (e2b) locally via Ollama. On its own, a small local model isn't impressive. But I've been building modelfiles from what I've worked out with Claude, distilling the reasoning, the structure, the decisions into a reusable skill that Gemma can execute. No API. No cloud. No ongoing cost. The effort is front-loaded. Once the logic is right, Gemma just runs it. Indefinitely. I'm still building this out, the library is small, the system isn't fully automated yet. But even half-built, it's already changed how I approach the question of where AI effort should actually go.

I'm not saying everyone needs three models. But if you're defaulting to one for everything, you're either overpaying, underusing what's available, or both.

I'm curious: if you're using more than one AI right now, what's your actual decision rule for which task goes where? Do you even have one?

#AIWorkflow #LocalLLM #Ollama #AILiteracy #BuildingInPublic