<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Ollama &amp;mdash; Hey, it&#39;s Bernie!</title>
    <link>https://berniepng.com/tag:Ollama</link>
    <description>Building at the intersection of AI, data, and real-world impact. Singapore.</description>
    <pubDate>Mon, 01 Jun 2026 17:39:57 +0000</pubDate>
    <item>
      <title>Tiering my LLMs so I stop burning money thinking out loud</title>
      <link>https://berniepng.com/tiering-my-llms-so-i-stop-burning-money-thinking-out-loud</link>
      <description>&lt;![CDATA[I had an uncomfortable moment recently. I looked at how I was using AI and realised I was doing what I used to hate about email, treating every situation the same way, defaulting to the same tool, and wondering why things felt inefficient. I was using Claude for everything. Quick lookups. Document summaries. Tasks I already knew the answer to. It&#39;s like hiring a senior consultant to photocopy things. So I built a simple tiering system. Three models. Each one has a specific job. And my token spend, and more importantly, my thinking, got a lot cleaner.&#xA;&#xA;Tier 1: Claude for thinking&#xA;This is where the real work happens.&#xA;When I wanted to build a personal finance system, I didn&#39;t ask Claude to build it. I asked Claude to help me think through what I actually needed, the folder structure, the monthly operating skills, a sensible way to keep my balance sheet updated. Claude helped me design the blueprint.&#xA;That blueprint becomes the instruction set for everything downstream. So this is where I invest time getting it right.&#xA;&#xA;Tier 2: Gemini for research and comparison&#xA;Once I have a direction, I don&#39;t need deep reasoning anymore. I need fast, reliable information. Gemini handles my desk research. Comparing tools. Pulling facts. Understanding what something actually does before I commit to it. Its context window is large, it&#39;s cheaper for this kind of work, and frankly it&#39;s better suited to it than Claude is. Right tool, right job.&#xA;&#xA;Tier 3: Gemma running locally, for free&#xA;This is the part I&#39;m most excited about, and the part most people skip entirely. I run Gemma 4 (e2b) locally via Ollama. On its own, a small local model isn&#39;t impressive. But I&#39;ve been building modelfiles from what I&#39;ve worked out with Claude, distilling the reasoning, the structure, the decisions into a reusable skill that Gemma can execute. No API. No cloud. No ongoing cost. The effort is front-loaded. Once the logic is right, Gemma just runs it. Indefinitely. I&#39;m still building this out, the library is small, the system isn&#39;t fully automated yet. But even half-built, it&#39;s already changed how I approach the question of where AI effort should actually go.&#xA;&#xA;I&#39;m not saying everyone needs three models. But if you&#39;re defaulting to one for everything, you&#39;re either overpaying, underusing what&#39;s available, or both.&#xA;&#xA;I&#39;m curious: if you&#39;re using more than one AI right now, what&#39;s your actual decision rule for which task goes where? Do you even have one?&#xA;&#xA;#AIWorkflow #LocalLLM #Ollama #AILiteracy #BuildingInPublic]]&gt;</description>
      <content:encoded><![CDATA[<p>I had an uncomfortable moment recently. I looked at how I was using AI and realised I was doing what I used to hate about email, treating every situation the same way, defaulting to the same tool, and wondering why things felt inefficient. I was using Claude for everything. Quick lookups. Document summaries. Tasks I already knew the answer to. It&#39;s like hiring a senior consultant to photocopy things. So I built a simple tiering system. Three models. Each one has a specific job. And my token spend, and more importantly, my thinking, got a lot cleaner.</p>

<p><strong>Tier 1: Claude for thinking</strong>
This is where the real work happens.
When I wanted to build a personal finance system, I didn&#39;t ask Claude to build it. I asked Claude to help me think through what I actually needed, the folder structure, the monthly operating skills, a sensible way to keep my balance sheet updated. Claude helped me design the blueprint.
That blueprint becomes the instruction set for everything downstream. So this is where I invest time getting it right.</p>

<p><strong>Tier 2: Gemini for research and comparison</strong>
Once I have a direction, I don&#39;t need deep reasoning anymore. I need fast, reliable information. Gemini handles my desk research. Comparing tools. Pulling facts. Understanding what something actually does before I commit to it. Its context window is large, it&#39;s cheaper for this kind of work, and frankly it&#39;s better suited to it than Claude is. Right tool, right job.</p>

<p><strong>Tier 3: Gemma running locally, for free</strong>
This is the part I&#39;m most excited about, and the part most people skip entirely. I run Gemma 4 (e2b) locally via Ollama. On its own, a small local model isn&#39;t impressive. But I&#39;ve been building modelfiles from what I&#39;ve worked out with Claude, distilling the reasoning, the structure, the decisions into a reusable skill that Gemma can execute. No API. No cloud. No ongoing cost. The effort is front-loaded. Once the logic is right, Gemma just runs it. Indefinitely. I&#39;m still building this out, the library is small, the system isn&#39;t fully automated yet. But even half-built, it&#39;s already changed how I approach the question of where AI effort should actually go.</p>

<p>I&#39;m not saying everyone needs three models. But if you&#39;re defaulting to one for everything, you&#39;re either overpaying, underusing what&#39;s available, or both.</p>

<p>I&#39;m curious: if you&#39;re using more than one AI right now, what&#39;s your actual decision rule for which task goes where? Do you even have one?</p>

<p><a href="https://berniepng.com/tag:AIWorkflow" class="hashtag"><span>#</span><span class="p-category">AIWorkflow</span></a> <a href="https://berniepng.com/tag:LocalLLM" class="hashtag"><span>#</span><span class="p-category">LocalLLM</span></a> <a href="https://berniepng.com/tag:Ollama" class="hashtag"><span>#</span><span class="p-category">Ollama</span></a> <a href="https://berniepng.com/tag:AILiteracy" class="hashtag"><span>#</span><span class="p-category">AILiteracy</span></a> <a href="https://berniepng.com/tag:BuildingInPublic" class="hashtag"><span>#</span><span class="p-category">BuildingInPublic</span></a></p>
]]></content:encoded>
      <guid>https://berniepng.com/tiering-my-llms-so-i-stop-burning-money-thinking-out-loud</guid>
      <pubDate>Fri, 01 May 2026 12:22:45 +0000</pubDate>
    </item>
  </channel>
</rss>