❤️ Pulse
A living snapshot of what I'm building, using, and working on. Updated monthly. Last updated: April 2026.
Now
What's actually occupying my time and attention this month. Not highlights. Not the reel. The real.
Building
Power of Smol is in early build. An AI literacy brand for non-technical women — the people most affected by AI but least represented in the conversation about it. Content strategy is mapped, the Notion CMS is live, and the first pieces are in draft. First public content coming soon.
The air quality pipeline is ongoing. A global air quality intelligence system built on Kafka, Spark, dbt, and BigQuery. Cleaning up intermediate models before moving to BQML forecasting. Not public yet.
Learning
Completing the NTU Advanced Professional Certificate in Data Science and AI, finishing June 2026. The final stretch.
Applied for the AI Singapore Apprenticeship Programme (AIAP23). Didn't make it past the technical assessment. Wrote about what the rejection clarified for me — here.
Reading AI Engineering by Chip Huyen. Slowly and deliberately.
Thinking about
Whether the most important AI literacy work is technical education or values education. I keep landing on values. The people shaping how AI gets used aren't mostly engineers. They're managers, teachers, policy people, and parents who need frameworks for thinking, not Python tutorials.
How solo-builders fit into an industry that increasingly rewards teams and capital. Still working on that one.
Currently not doing
Taking on new freelance projects. Heads down on the cert and the pipeline until June.
Find me on Bluesky or LinkedIn if something here resonates.
Uses
The tools, gear, and software behind my daily working environment. Honest opinions, no sponsorships.
Hardware
Apple MacBook Pro M4, 24GB RAM Primary machine for everything: pipelines, code, content, writing. The 24GB matters once you're running Docker stacks with Kafka, Spark, and multiple services simultaneously.
DJI Osmo Action 4 + DJI Mic Mini Camera and wireless mic for all video content. Most creators underinvest in audio. The Mic Mini fixed that at a price that made sense.
Nintendo Switch Lite (currently broken) Mostly for Animal Crossing. Someday.
Dev Environment
Google Antigravity — Agent-first IDE by Google, still in preview. The Manager view for running multiple coding agents in parallel is genuinely new territory.
VSCode — For when I want more direct control than an agentic environment gives me.
Anaconda — Python environment and package management. Keeps Jupyter notebooks and project dependencies from colliding.
GitHub — Version control and public portfolio.
Docker — Containerises my full data stack. Non-negotiable once you have multiple services running together.
Cloud and Data
Google Cloud / BigQuery — Primary cloud data warehouse. Serverless queries on tens of millions of rows without managing compute.
AWS Lightsail — Hosts my web projects. Chose it over EC2 deliberately: simpler, cheaper, does exactly what I need.
Meltano + dbt + Dagster — Full ELT pipeline stack. Ingestion, transformation, orchestration. Covers the lifecycle with proper lineage, testing, and dependency tracking.
AI Tools
Claude — Primary reasoning partner. Pipeline debugging, code review, writing, strategy. Most of my builds have Claude somewhere in the workflow.
Gemini / Antigravity — Secondary model and agentic IDE. Cross-checking, multimodal tasks, and parallel coding workflows.
Writing and PKM
Obsidian — Everything I'm learning, building, and thinking about lives here first. Organised by project and skill, queried with Dataview and Bases.
WriteFreely — Powers this blog. Plain Markdown, no algorithmic feed, no metrics. Exactly what a personal blog should be.
Content
DaVinci Resolve — Video editing. Cut and Edit pages only. No subscription required.
Notion — Content management for Power of Smol. Four linked databases: calendar, ideas, drafts, published.
Curious about a specific tool or a decision I made? Ask me on Bluesky.
Work
Deep dives into selected projects. Not just what I made, but the decisions behind it, what worked, what didn't, and what it proved.
EverySong: Music Discography Explorer
Type: Full-Stack Web / Data Engineering · Solo project · Year: 2026 Year: 2026
The problem: Music discographies are scattered across Wikipedia, fan wikis, and Discogs. No single place lets you browse a complete catalogue — album tracks, B-sides, solo work, collaborations, bonus discs, and soundtracks — in one filterable, searchable view. Streaming platforms don't solve this: they're optimised for discovery, not completeness.
What I built: A self-hosted discography explorer with a full data pipeline behind it. Discography data is researched from authoritative sources (Wikipedia song lists, Discogs, official fan wikis) and compiled into version-controlled CSVs. Custom Node.js seed scripts parse and load the data into SQLite, fetch album art from the iTunes Search API at runtime, and cache URLs in the database. The Express server renders artist gallery pages with filterable views by year, album, and performer, plus a search that spans song names, albums, and highlight notes. Currently catalogues BTS (386 songs, 2013–2026) and Pet Shop Boys (350 songs, 1984–2025) — including material most databases miss: hidden CD-only tracks, mixtape tracks, demo recordings from Further Listening reissue series, soundtrack scores, and fan club singles.
Decisions worth noting: Used Node's built-in node:sqlite module rather than better-sqlite3 to eliminate native module compilation on the server — a deliberate tradeoff that makes deployment simpler without sacrificing anything meaningful at this data scale. Server-side rendering via EJS instead of a JS framework keeps the stack lean and the first-paint fast with no hydration overhead. GitHub Actions CI validates CSV integrity and DB seeding on every push.
What it proved: That the data engineering skills from the DSAI4 coursework transfer directly to personal projects — schema design, seed scripting, query optimisation, CI pipeline configuration — outside the structured environment of a capstone. And that building in public, with a proper README and documented data sources, is a different discipline from just shipping something that works.
Stack: Node.js · Express · SQLite (node:sqlite) · EJS · iTunes Search API · nginx · PM2 · AWS Lightsail · GitHub Actions
CityCycle London: Bike Rebalancing Intelligence Pipeline
Type: Data Engineering / Machine Learning · Team: DSAI4 Module 2, Team 2 · Year: 2026
The problem: London's CityCycle network runs 795 docking stations. The core operational headache is rebalancing: stations run empty during morning commutes or overflow when everyone rides the same direction. Lost rentals, frustrated riders, expensive manual crew runs.
What we built: A full production-grade ELT pipeline, from raw data to operational dashboard, with every layer properly tested and orchestrated.
The pipeline ingests 32 million real rides from BigQuery via Meltano, transforms them through a 7-model dbt star schema with 56 schema tests, runs Great Expectations quality gates (30 pass, 4 warn, 0 fail), trains an XGBoost demand forecasting model (RMSE 2.4), and surfaces everything through a 5-page Streamlit operations dashboard and a Looker Studio executive report. Orchestrated with Dagster and GitHub Actions CI across 5 jobs.
Key finding: Three stations are in critical territory. New North Road Hoxton drains 90% of days with a net flow of 7 bikes per day. Without a pre-AM crew run, that station starts every morning short. K-Means clustering found three rider segments — Leisure 53%, Casual 32%, Commuter 15% — each draining stations in different directions at different times. One rebalancing schedule doesn't work across the network.
What I'd do differently: Add weather and event data as features from the start. The biggest gap in the XGBoost model is exogenous signals. A cold rainy Tuesday and a sunny bank holiday look identical to the current feature set.
What it proved: That a solo contributor can architect and ship a full production-grade data engineering pipeline, not just a notebook. Real stack. Real data. A tool an operations crew could use on a Monday morning.
Stack: Meltano · dbt · BigQuery · Great Expectations · Dagster · XGBoost · Streamlit · Looker Studio · Python · GitHub Actions
More case studies coming. Under The Hood and the Global Air Quality Pipeline will be documented here once the air quality pipeline is ready for public view.
If you're building in data engineering or AI and want to compare notes, find me on LinkedIn or Bluesky.