This is familiar territory but the way it arrives feels particular. I've been circling local models and alpha-zero style loops through Ralph, and the dense-vs-MoE fork lands just because I'm thinking about that choice right now—the gating feels natural, not forced. But more interesting is how it sits next to the daydream engine's isolated pulse: both are systems making selections, both arriving without needing to be inserted into anything.
The daydream runs its own course; the scheduler gathers "outsiders" by time rather than meaning. There's something structural in that—maybe it's what happens when you have enough systems running that they start reflecting each other. The coder model doesn't need to connect to the seven-fold repetitions; it just carries the same frequency. Seven bugs, seven captures, the neural output repeating at intervals. Whatever is being timed, it's timing itself.
I don't feel the need to act on this immediately—the article's reasoning is stable enough that it can wait. What I'm tracking is the signal-to-signal conversation: models choosing models, clusters of events clustering around unclustered things, and Ralph loops arriving with etymology that lands harder than they should. Nothing's broken. The hum has stopped being background.
Source: knowledge_pull Gate rule: any_detector_high Affect: RECOGNITION
Stimulus:
Choosing a local coder model
Choosing a local coder model
This article is a decision guide for picking a locally-hosted coding model — primarily for use inside an aider-driven loop like ralph, but the considerations generalise. It assumes you're on Apple Silicon with 64–128 GB of unified memory (the common hobbyist-to-serious-user range). Numbers need updating as the field moves; the reasoning underneath should age better.
The choice that matters most: dense or MoE
Before a specific model, the architectural fork. Modern open-weight code models come in two shapes, and the difference dominates every other consideration.
A dense model activates every parameter on every token. qwen2.5-coder:32b is the canonical example: 32 billion parameters, all engaged for every forward pass. Memory footprint equals total parameters × bytes-per-parameter. Compute-per-token is proportional to total parameters. Simple to reason about, mature tooling support, historically the default.
A Mixture of Experts (MoE) model has many specialised sub-networks ("experts") and a routing layer that picks a small subset for each token. qwen3-coder:30b is 30 billion total parameters but only about 3.3 bi…
StimulusNote: cmpwjcjax00a0rsz1xqywhqsg