Blog

The Joys of Building in Public

July 10, 2025 By Matt Koppenheffer

Building in public feels a bit like showering in public. Both require a certain comfort with vulnerability, with being seen in an unfinished state, with accepting that not everyone will appreciate the view.

When Your Training Wheels Need to Come Off: Moving QualRank AI from Replit to Railway

July 8, 2025 By Matt Koppenheffer

I was drafting beta invites when I realized my latest evaluation results had vanished. Not a great look when you're about to ask people to test your AI model comparison tool. That's when it hit me: I was about to invite beta testers to use an app that couldn't remember what they'd done.

How I Built a Working AI Product in 3 Weeks (As a Non-Developer)

July 1, 2025 By Matt Koppenheffer

Most founders waste months perfecting features users don't want. I took a different approach. Three weeks ago, I had nothing – no POC, no MVP, not even a sketch. Today, I'm sending beta invites for a tool that lets users test prompts across multiple AI models...

The Model Eval Tool I Wish I Had (And Am Now Building)

June 30, 2025 By Matt Koppenheffer

We had a problem. I was at The Motley Fool and we'd just completed content pipelines to create thousands of earnings reports every quarter for our members. We'd used GPT-4-Turbo because that was the obvious choice at the time, but then the model landscape kept shifting...

Apple's Research and Poker's Reality Check

June 12, 2025 By Matt Koppenheffer

Yes, of course I had to take a look at the paper from Apple on AI reasoning models. "The Illusion of Thinking" reveals that even the most sophisticated AI reasoning models experience complete accuracy collapse beyond certain complexity thresholds...

What "Context" Means: When More Data Makes AI Dumber

June 10, 2025 By Matt Koppenheffer

Some AI practitioners are drowning their models in data, mistaking context window size for context quality. As context windows expand to accommodate millions of tokens, there's a dangerous assumption at work: if the model can handle more information, it should...

Beyond the Honeymoon: What Happens When AI Coding Gets Complicated

June 5, 2025 By Matt Koppenheffer

Everyone's sharing their AI coding wins. The perfect app built in an afternoon. The flawless feature shipped in minutes. But here's what nobody talks about: what happens when you move past the proof-of-concept phase and things get messy?

I Asked Claude Sonnet 4 to Build an App. Then I Watched Football.

June 2, 2025 By Matt Koppenheffer

Last week, I had an idea for a web app I'm calling "Sir Promptington" - a tool that uses multiple LLM models to score, evaluate, and improve user prompts. Instead of writing out a bunch of planning docs, I decided to test Claude Sonnet 4's rumored instruction-following prowess...

What Ever Happened to DeepSeek?

May 30, 2025 By Matt Koppenheffer

DeepSeek just released an update to its R1 reasoning model. The response? Crickets. Remember January? This same Chinese AI startup triggered a near $1 trillion market selloff. Three weeks later, DeepSeek had faded into background noise...

Field Test: Does Gemini Know It's Not Creative?

May 19, 2025 By Matt Koppenheffer

Can an LLM judge its own creative limitations? In this inaugural Field Test, I'm pitting four leading LLMs against each other in a (cheesy) creative challenge and testing Gemini's self-awareness about its own creative shortcomings.

The Safe Turtle: Building Crush-Proof AI Products in the Giant's Shadows

May 13, 2025 By Matt Koppenheffer

On my run the other day, I spotted a small turtle in the middle of the road. I picked it up and carried it to the nearby creek. It got me thinking about forces greater than ourselves and how we can build defensible AI products...

Prompt Engineering Is Dead. Long Live Prompt Engineering.

May 7, 2025 By Matt Koppenheffer

I've read quite a bit that prompt engineering is no longer relevant. Or that it soon won't be. Or that it shouldn't be necessary. I don't buy it. Or, at least, not entirely...

Musings & Ponders

The Joys of Building in Public

When Your Training Wheels Need to Come Off: Moving QualRank AI from Replit to Railway

How I Built a Working AI Product in 3 Weeks (As a Non-Developer)

The Model Eval Tool I Wish I Had (And Am Now Building)

Apple's Research and Poker's Reality Check

What "Context" Means: When More Data Makes AI Dumber

Beyond the Honeymoon: What Happens When AI Coding Gets Complicated

I Asked Claude Sonnet 4 to Build an App. Then I Watched Football.

What Ever Happened to DeepSeek?

Field Test: Does Gemini Know It's Not Creative?

The Safe Turtle: Building Crush-Proof AI Products in the Giant's Shadows

Prompt Engineering Is Dead. Long Live Prompt Engineering.

Stay Updated