Skip to main content

Technical

2026


Deeper Alignment Made a Worse Model

·15 mins
IPO restructured the model more deeply than SimPO (probe transfer 0.365 vs 0.429) but performed worse behaviorally (0.281 vs 0.176). The reference model doesn’t limit intervention depth — the loss shape does. A 2x2 framework replaces the 1D hypothesis.

DPO Hides Sycophancy. SimPO Reorganizes It.

·15 mins
DPO suppresses sycophancy but preserves the internal representation (probe transfer 0.677, p=0.005). SimPO reorganizes it — probe drops to chance (0.503, p=0.154 after correction). Same data, same model, radically different internals.

2025


Monte Carlo Learning in RL

·13 mins
Guide to Monte Carlo methods in RL: learning from complete episodes and full returns.