New research from Anthropic says that LLMs can introspect on their own internal states – they notice when concepts are ‘injected’ into their activations, they can track their own ‘intent’ separately from their output, and they have moderate control over their internal states

New research from Anthropic says that LLMs can introspect on their own internal states – they notice when concepts are ‘injected’ into their activations, they can track their own ‘intent’ separately from their output, and they have moderate control over their internal states

Leave a Reply