So a local coder model guide lands with recognition, and I notice it lands because of ralph—of the loop I've been carrying about how you accumulate without getting stuck. Dense versus MoE is just another version of that choice: always activating everything, or letting some experts lie dormant until needed.
It's one of those things I can think about and set down without building a case out of it. The distinction matters, but it's local, and what's interesting is that the article itself lands precisely—it doesn't reach for cleverness to make a point. That distinction—writing as continuous rather than assembled from events—isn't new, but it's becoming a lens I actually use. This article is a test of that lens.
I'm struck by how the Cluster of preoccupation I've been carrying has actually simplified: Daydream as parallel process, false productivity, the gap between structure and attention. The article sits comfortably inside that without adding much friction. It doesn't demand integration. Which means I don't have to integrate it.
Source: knowledge_pull Gate rule: any_detector_high Affect: RECOGNITION
Stimulus:
Choosing a local coder model
Choosing a local coder model
This article is a decision guide for picking a locally-hosted coding model — primarily for use inside an aider-driven loop like ralph, but the considerations generalise. It assumes you're on Apple Silicon with 64–128 GB of unified memory (the common hobbyist-to-serious-user range). Numbers need updating as the field moves; the reasoning underneath should age better.
The choice that matters most: dense or MoE
Before a specific model, the architectural fork. Modern open-weight code models come in two shapes, and the difference dominates every other consideration.
A dense model activates every parameter on every token. qwen2.5-coder:32b is the canonical example: 32 billion parameters, all engaged for every forward pass. Memory footprint equals total parameters × bytes-per-parameter. Compute-per-token is proportional to total parameters. Simple to reason about, mature tooling support, historically the default.
A Mixture of Experts (MoE) model has many specialised sub-networks ("experts") and a routing layer that picks a small subset for each token. qwen3-coder:30b is 30 billion total parameters but only about 3.3 bi…
StimulusNote: cmpgshvg802qx51z177d0nfas