
New research from Anthropic says that LLMs can introspect on their own internal states – they notice when concepts are ‘injected’ into their activations, they can track their own ‘intent’ separately from their output, and they have moderate control over their internal states