Blog

Musings & Ponders

Sharing insights from my experiences building AI products and leading teams.

The Joys of Building in Public

Building in public feels a bit like showering in public. Both require a certain comfort with vulnerability, with being seen in an unfinished state, with accepting that not everyone will appreciate the view.

Read More

How I Built a Working AI Product in 3 Weeks (As a Non-Developer)

Most founders waste months perfecting features users don't want. I took a different approach. Three weeks ago, I had nothing – no POC, no MVP, not even a sketch. Today, I'm sending beta invites for a tool that lets users test prompts across multiple AI models...

Read More

The Model Eval Tool I Wish I Had (And Am Now Building)

We had a problem. I was at The Motley Fool and we'd just completed content pipelines to create thousands of earnings reports every quarter for our members. We'd used GPT-4-Turbo because that was the obvious choice at the time, but then the model landscape kept shifting...

Read More

Apple's Research and Poker's Reality Check

Yes, of course I had to take a look at the paper from Apple on AI reasoning models. "The Illusion of Thinking" reveals that even the most sophisticated AI reasoning models experience complete accuracy collapse beyond certain complexity thresholds...

Read More

What "Context" Means: When More Data Makes AI Dumber

Some AI practitioners are drowning their models in data, mistaking context window size for context quality. As context windows expand to accommodate millions of tokens, there's a dangerous assumption at work: if the model can handle more information, it should...

Read More

I Asked Claude Sonnet 4 to Build an App. Then I Watched Football.

Last week, I had an idea for a web app I'm calling "Sir Promptington" - a tool that uses multiple LLM models to score, evaluate, and improve user prompts. Instead of writing out a bunch of planning docs, I decided to test Claude Sonnet 4's rumored instruction-following prowess...

Read More

What Ever Happened to DeepSeek?

DeepSeek just released an update to its R1 reasoning model. The response? Crickets. Remember January? This same Chinese AI startup triggered a near $1 trillion market selloff. Three weeks later, DeepSeek had faded into background noise...

Read More

Field Test: Does Gemini Know It's Not Creative?

Can an LLM judge its own creative limitations? In this inaugural Field Test, I'm pitting four leading LLMs against each other in a (cheesy) creative challenge and testing Gemini's self-awareness about its own creative shortcomings.

Read More

Stay Updated

Want to know when new content is available? Feel free to reach out.