Writing
Blog
Notes on local AI models, sustainable inference, and things I'm building.
Energy per token: measuring what inference actually costs
Tokens per second is the wrong metric if you care about sustainability. Notes from measuring watt-hours per response on consumer hardware.
Read post →Why local LLMs matter more than the benchmarks suggest
Cloud models win leaderboards, but most everyday tasks don't need a frontier model. What you get back from running open weights on your own machine.
Read post →