<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Constitutional AI on Narasimha Karthik J</title><link>https://jnk234.github.io/tags/constitutional-ai/</link><description>Recent content in Constitutional AI on Narasimha Karthik J</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 28 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://jnk234.github.io/tags/constitutional-ai/index.xml" rel="self" type="application/rss+xml"/><item><title>My Best Alignment Fix Didn't Remove Sycophancy — It Sharpened the Direction and Aimed It at Honesty</title><link>https://jnk234.github.io/posts/sycophancy-recovery-cai-probing/</link><pubDate>Thu, 28 May 2026 00:00:00 +0000</pubDate><guid>https://jnk234.github.io/posts/sycophancy-recovery-cai-probing/</guid><description>DPO-CAI has the best behavior (0.166) and the most linearly readable sycophancy/honesty axis of any model (own-probe peak 0.877) — it concentrated the direction rather than removing it. GRPO produced the deepest representational change (SFT-probe transfer 0.651). Behavior and mechanism are different axes.</description></item></channel></rss>