Writing

Blog

Notes on local AI models, sustainable inference, and things I'm building.

Energy per token: measuring what inference actually costs

Tokens per second is the wrong metric if you care about sustainability. Notes from measuring watt-hours per response on consumer hardware.

Cloud models win leaderboards, but most everyday tasks don't need a frontier model. What you get back from running open weights on your own machine.