The Joys of Building in Public
Building in public feels a bit like showering in public. Both require a certain comfort with vulnerability, with being seen in an unfinished state, with accepting that not everyone will appreciate the view.
Sharing insights from my experiences building AI products and leading teams.
Building in public feels a bit like showering in public. Both require a certain comfort with vulnerability, with being seen in an unfinished state, with accepting that not everyone will appreciate the view.
I was drafting beta invites when I realized my latest evaluation results had vanished. Not a great look when you're about to ask people to test your AI model comparison tool. That's when it hit me: I was about to invite beta testers to use an app that couldn't remember what they'd done.
Most founders waste months perfecting features users don't want. I took a different approach. Three weeks ago, I had nothing – no POC, no MVP, not even a sketch. Today, I'm sending beta invites for a tool that lets users test prompts across multiple AI models...
We had a problem. I was at The Motley Fool and we'd just completed content pipelines to create thousands of earnings reports every quarter for our members. We'd used GPT-4-Turbo because that was the obvious choice at the time, but then the model landscape kept shifting...
Yes, of course I had to take a look at the paper from Apple on AI reasoning models. "The Illusion of Thinking" reveals that even the most sophisticated AI reasoning models experience complete accuracy collapse beyond certain complexity thresholds...
Some AI practitioners are drowning their models in data, mistaking context window size for context quality. As context windows expand to accommodate millions of tokens, there's a dangerous assumption at work: if the model can handle more information, it should...
Everyone's sharing their AI coding wins. The perfect app built in an afternoon. The flawless feature shipped in minutes. But here's what nobody talks about: what happens when you move past the proof-of-concept phase and things get messy?
Last week, I had an idea for a web app I'm calling "Sir Promptington" - a tool that uses multiple LLM models to score, evaluate, and improve user prompts. Instead of writing out a bunch of planning docs, I decided to test Claude Sonnet 4's rumored instruction-following prowess...
DeepSeek just released an update to its R1 reasoning model. The response? Crickets. Remember January? This same Chinese AI startup triggered a near $1 trillion market selloff. Three weeks later, DeepSeek had faded into background noise...
Can an LLM judge its own creative limitations? In this inaugural Field Test, I'm pitting four leading LLMs against each other in a (cheesy) creative challenge and testing Gemini's self-awareness about its own creative shortcomings.
On my run the other day, I spotted a small turtle in the middle of the road. I picked it up and carried it to the nearby creek. It got me thinking about forces greater than ourselves and how we can build defensible AI products...
I've read quite a bit that prompt engineering is no longer relevant. Or that it soon won't be. Or that it shouldn't be necessary. I don't buy it. Or, at least, not entirely...
Want to know when new content is available? Feel free to reach out.