But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors
We use steering vectors to obtain alternative, honest responses, helping external LLM-judges detect subtle instances of dishonest or manipulative behavior.