Blog

Musings & Ponders

I've moved my writing to Substack for better reach and engagement. Here's what I've been writing about lately.

Recent Posts

Previous Posts

My earlier posts are archived here on the site, but all new content is published on Substack.

Archive

GPT-5 Reality Check

GPT-5 is here. But what does that really mean? And what does it mean for you? Here's what I think.

Read More

You Don't Always Get to Choose the Weather

You don't always get to choose the weather. Last week: threshold run in 108-degree heat. This week: same workout in driving rain. Both times, I could have chosen the treadmill. But on race day, I won't get to choose conditions. Same deal in business.

Read More

AI's Kendrick Lamar Problem

I have a Kendrick Lamar problem. Or rather, AI has a Kendrick Lamar problem. As part of testing my cross-model evaluation system, I gave four models a simple task: tell me who the best rapper alive is. All four unanimously chose Kendrick. Here's what that reveals about AI consensus and creativity.

Read More

Context Engineering: Between Prompts and RAG

Context Engineering sits in that critical zone between prompt engineering and RAG—and it's where most AI implementations actually succeed or fail. It was my biggest breakthrough when building AI-driven content systems.

Read More

The Joys of Building in Public

Building in public feels a bit like showering in public. Both require a certain comfort with vulnerability, with being seen in an unfinished state, with accepting that not everyone will appreciate the view.

Read More

The Model Eval Tool I Wish I Had (And Am Now Building)

We had a problem. I was at The Motley Fool and we'd just completed content pipelines to create thousands of earnings reports every quarter for our members. We'd used GPT-4-Turbo because that was the obvious choice at the time, but then the model landscape kept shifting...

Read More

Apple's Research and Poker's Reality Check

Yes, of course I had to take a look at the paper from Apple on AI reasoning models. "The Illusion of Thinking" reveals that even the most sophisticated AI reasoning models experience complete accuracy collapse beyond certain complexity thresholds...

Read More

What "Context" Means: When More Data Makes AI Dumber

Some AI practitioners are drowning their models in data, mistaking context window size for context quality. As context windows expand to accommodate millions of tokens, there's a dangerous assumption at work: if the model can handle more information, it should...

Read More

I Asked Claude Sonnet 4 to Build an App. Then I Watched Football.

Last week, I had an idea for a web app I'm calling "Sir Promptington" - a tool that uses multiple LLM models to score, evaluate, and improve user prompts. Instead of writing out a bunch of planning docs, I decided to test Claude Sonnet 4's rumored instruction-following prowess...

Read More

What Ever Happened to DeepSeek?

DeepSeek just released an update to its R1 reasoning model. The response? Crickets. Remember January? This same Chinese AI startup triggered a near $1 trillion market selloff. Three weeks later, DeepSeek had faded into background noise...

Read More

Field Test: Does Gemini Know It's Not Creative?

Can an LLM judge its own creative limitations? In this inaugural Field Test, I'm pitting four leading LLMs against each other in a (cheesy) creative challenge and testing Gemini's self-awareness about its own creative shortcomings.

Read More

Stay Updated

Subscribe to my Substack to get new posts delivered to your inbox, or reach out directly.