Writing

Blog

Notes on local AI models, sustainable inference, and things I'm building.

2 min read

Energy per token: measuring what inference actually costs

Tokens per second is the wrong metric if you care about sustainability. Notes from measuring watt-hours per response on consumer hardware.

  • sustainability
  • benchmarking
  • quantization
Read post →
2 min read

Why local LLMs matter more than the benchmarks suggest

Cloud models win leaderboards, but most everyday tasks don't need a frontier model. What you get back from running open weights on your own machine.

  • local-ai
  • ollama
  • llama.cpp
Read post →